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METHODS OF APPORTIONING SEATS IN THE HOUSE 
OF REPRESENTATIVES 


Watrter F. WILLcox* 
Cornell University 


HE subject of Congressional Apportionment was transferred a few 

years ago to the Judiciary Committees of Congress; no member of 
either committee, I believe, was in Washington in the late twenties 
when the long fight over the question ended with the dangerous in- 
crease in the size of the House halted and with reapportionment made 
a ministerial act about which Congress need no longer concern itself 
after each census as it had done for 130 years. 

In view of that situation the Judiciary Committee of the House 
asked me to report to them on methods of apportionment and to in- 
clude so much of the congressional history of the subject as might help 
them in dealing with it. My reply to the committee was sent some 
months ago; it ended by proposing two amendments to the automatic 
act of 1929, one of them changing the present method to a novel one 
which I now think the best, the other lopping off ten members auto- 
matically after each future census from an overgrown House until it 
approaches a membership of three hundred, the number mentioned 
often in congressional debates as a desirable goal. This article will 
supplement my report to Congress and lay before scholars the main 
results of my work which has lasted for more than half a century, a 
period during which some of my early conclusions have changed and 
several new ones emerged. 

I had to prepare apportionment tables for Congress after the census 
of 1900. After the following census two groups of students entered the 
field. One, for whom Allyn A. Young spoke, held that the question of 
method “is mathematical and ought to be decided upon the basis of a 
general consensus of mathematical opinion.”! The position of the other 
group was stated thus: “The question involved in the apportionment of 





* Editor's Note: Professor Willcox was President of the American Statistical Association in 1912 
and of the American Economic Association in 1915. 
1 Allyn Young, circular letter to statisticians and mathematicians dated Nov. 19, 1926 (unprinted). 
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representatives is primarily one of constitutional law. The role of 
mathematics in the problem is to make clear the consequences of any 
given interpretation of the Constitution.”? 

This article will defend the second position. My understanding of the 
difference between the two groups was stated in a letter to an oppo- 
nent: “Our conference left me with the impression that all the men 
present except Professor X and me believed that there is only one right 
method of apportionment, the method of equal proportions. ... My 
view is that the methods can be arranged in an order of preferability, 
that their sequence in that order depends upon the predominant object 
to be secured by apportionment, that the object of apportionment is a 
political rather than a mathematical problem and one to be deter- 
mined, therefore, not by academic students but by Congress.”* 

Those who wrote the Constitution intended, I believe, to make the 
resident of a state the unit of representation in the House, as they had 
made the state itself the unit in the Senate. The Constitution contains 
three passages which bear on the method by which to carry out that 
intention. They are: 

(1) “The number of Representatives shall not exceed one for every 
thirty thousand.” 

(2) “Representatives shall be apportioned among the several States 
according to their respective numbers.” 

(3) “Each State shall have at least one Representative.” 

The first passage alone determined the method of apportionment 
used by Congress before 1840, the second alone has underlain the four 
methods used since 1840, the second modified by the third furnishes a 
basis for the novel method proposed herein. My arguments are ad- 
dressed primarily to Congress because in deciding this question Con- 
gress is the jury, but I cannot succeed in that quarter without support 
from the public, hence this article. 

The House of Representatives at first had only 65 members, but 
Congress soon became convinced that it should be enlarged as much as 
possible. The limit on size set in the first passage proved to be ambigu- 
ous; for five months Senate and House disputed over how to interpret 
it. Did it mean not more than one for every thirty thousand in each 
state, 112 in all, or not more than one for every thirty thousand in all 
the states, 120 in all? The House sent on to the Senate a bill apportion- 
ing 112 representatives. The Senate returned it after adding eight seats 





2 F. W. Owens, “On the Apportionment of Representatives,” American Statistical Association 
Publications, 17 (1921), p. 968. 
3 Walter F. Willcox, circular letter of Aug. 12, 1931 (unprinted). 
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for large remainders and in that form it reached the President. Wash- 
ington vetoed it, relying on the advice of Jefferson, then Secretary of 
State and in charge of the census, that the method was unconstitu- 
tional. That veto established the method which Congress used for half 
a century; it may be called the Jefferson method or method of rejected 
fractions.‘ Its essential feature is that it apportions representatives only 
for units. 

This method was abandoned forty years later as a result of Webster’s 
criticism. Under the discarded method the larger a state the smaller 
the proportion between its rejected fraction and its population, hence 
the smaller its district population and the more representatives it 
would get. Under 1950 figures, for example, the method of rejected 
fractions would have transferred to states above the average size nine 
seats which the present method gave to states below the average size. 

Among the tests of a method one needs to be mentioned here because 
it underlies Webster’s method. First divide the states into three groups, 
large, small and very small, the line separating large and small being 
the average population of those two groups, and the line separating 
small and very small being whether a state gets its one seat for popula- 
tion or by constitutional guarantee. Next compute the average district 
population of each of the two groups of large and small states. The 
nearer those averages are to each other, the better the method. 

This test also measures the nearness to equal representation given to 
one person or one million persons wherever the residence; that is prob- 
ably what those who wrote the Constitution had in mind. 

Webster’s proposal was to give supplementary seats for fractions 
larger than one half, “major fractions.” An apportionment bill had 
passed the House after the 1830 census and been referred to his com- 
mittee. He reported that the method of rejected fractions was uncon- 
stitutional because it did not meet the second constitutional require- 
ment and apportion seats to the several States according to their re- 
spective numbers. Webster computed the number of representatives a 
few states would have if the proportion each had of 240 was as near 
as possible to the proportion of its population to that of all the states. 
He showed that the population of New York and Vermont, for example, 
would entitle the former to 38.59 representatives and the latter to 5.65 
and claimed that New York should receive 39 seats and Vermont 6 
instead of the 40 and 5 given them in the bill. 

If he had been able to show Congress that the current method must 





‘ Jefferson wrote in a memorandum for which Washington had asked, “Fractions must be neglected 
because the Constitution . . . has left them unprovided for.” Wrtings. (1940 ed.), vol. 3, pp. 201-11. 
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overrepresent a large state and underrepresent a small one, his argu- 
ment might have carried greater weight. He might have said that by 
it the average district population in large states would be much below 
that in small ones, and that, if Congress should adopt his amendment, 
the difference would be only one sixth as great. Stated in a way perhaps 
more meaningful to Congress, his method would have transferred 
seats from three large states, Kentucky, Pennsylvania, and New York, 
to three small ones, Delaware, Missouri and Vermont. 

He carried the Senate, but when the House rejected his amendment 
the upper house yielded. His failure may have been due even more to 
a weakness in his mathematics which I have explained elsewhere and 
which I was able eighty years later to correct. 

Webster’s contributions were that he called attention to the second 
constitutional requirement, showed that the results of the current 
method violated it and that to apportion supplementary seats for frac- 
tional remainders larger than one half would be a great improvement. 
But he did not answer either of two questions the first of which long 
vexed Congress. That is, How is the common divisor which his method 
needed to be found? The other is, How is the problem affected by the 
third requirement in the constitution? 

In the period between 1832 and 1910 Congress tried to adapt Web- 
ster’s revolutionary idea to a problem the shape of which was changing. 
In 1842 Congress experimented with his method. The law specified the 
common divisor which he did not know how to compute for a specified 
number of seats and provided for “one additional Representative for 
each State having a fraction greater than one moiety of the said ratio,”® 
but it did not specify the size of the House. When the 1850 census was 
at hand Congress was minded to stabilize the size of the House at 233 
members by a ministerial apportionment. The law instructed the Secre- 
tary of the Interior, first, to divide the combined population of the 
states by 233, then to divide the population of each state by the quo- 
tient, and finally to apportion one seat for each unit and enough sup- 
plementary seats for large fractions to reach the required total. 

This method dodged one of the difficulties in Webster’s proposal, 
how to find the common divisor, but two others remained. The method 
might yield one or more seats for large minor fractions, or withhold 
seats for one or more small major fractions. As long as the House did 
not increase in size the method worked well, but after the 1870 census 





5 Walter F. Willcox, “Last Words on the Apportionment Problem,” Law and Contemporary Problems, 
vol. 17, p. 291. 
6 Statutes at Large, vol. 9, p. 432. 
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Congress resumed the policy of apportioning enough seats to keep 
every delegation intact and the experience of forty years with that 
change wrecked the method. The Superintendent of the Census began 
to send Congress tables based on the 1850 method which showed the 
distribution of each number of seats within the limits of size which 
interested the House. These tables showed that now and then one or 
more states might receive a seat for a large minor fraction, one or 
more might fail to receive a seat for a small major fraction, or when 
one member was added to the total two states might gain and a third 
lose a seat (the “Alabama paradox”). 

As I have said, my connection with the problem began in 1900 when 
I was placed in charge of a division in the Census Bureau which was to 
prepare the apportionment tables. The law said we should use the Vin- 
ton method, but, as tables based on earlier figures had shown its weak- 
ness, we submitted a second set based cn an imnerfect understanding 
of the Webster method. 

The sizes of the House which interested Congress were 357, its exist- 
ing size, and 386, the size at which no state would lose a seat. Our 
Vinton tables contained two examples of the Alabama paradox at 
just these numbers, one affecting Colorado, the other Maine. For Colo- 
rado, the figures ran: 


Size of House 356 357 358 
Seats for Colorado 3 2 3 


For Maine they were: 


Size of House 384 385 386 387 
Seats for Maine 4 4 3 4 


Congress got over the hurdle and apportioned 386 seats by starting 
with the table for 384, which contained two quotients with major frac- 
tions for which no seats were apportioned. For each of these quotients 
Congress gave an extra seat and thus reached 386, the number desired, 
with no state losing a seat and no major fraction unrewarded. 

After following the long House debate I returned to Cornell sure 
that the principle of the Webster method was sound but its mathe- 
matics weak. Once the difficulty was grasped, the solution was obvious; 
apply the sliding divisor concept. After the 1910 figures had been 
announced I took to Washington a set of tables and an explanatory 
letter in which I had written: 

“The history of reports, debates, and votes upon apportionment 
seems to show a settled conviction in Congress that every major frac- 
tion gives a valid claim to an additional Representative. . . . The pres- 
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ent method is based upon that conviction and seeks to facilitate action 
in conformity with it. Because of this feature I have called it the method 
of major fractions. 

“The results are simple, but the method itself is somewhat difficult 
to explain. If a ratio of 240,000 persons to each Representative be 
assumed arbitrarily as a starting point, that number divided into the 
population of each state and one Representative assigned for each 
whole number and each major fraction in the series of quotients, a total 
of 383 Representatives is reached. If the ratio be then diminished by 
10 to 239,990, no difference in the apportionment will result, but the 
decimal in each quotient will be slightly increased. If the ratio be fur- 
ther reduced to 239,980, 239,970, etc., the decimals continue to in- 
crease with each change of ratio, but with varying rapidity. It is a 
simple problem to compute in which State the decimal will first pass 
.500 and become a major fraction and at just what ratio the change 
will occur. In the present case the State whose decimal first reached 
.500 is Illinois and the corresponding ratio is 239,940, . . . which has 
been called the boundary ratio.’ 

The Bureau of the Census handed Congress two other sets of tables, 
one based on the prescribed Vinton method, the other on a novel 
mathematical analysis of the problem devised by my successor in 
Washington and perfected later by Professor E. V. Huntington. The 
1911 apportionment was based on the Cornell tables. 

The figures of the 1920 census showed that the Huntington method, 
or method of equal proportions, would give three seats to small states, 
(Rhode Island, Vermont and New Mexico) which the Webster method 
would give to large ones (New York, North Carolina and Virginia). 
This difference in the results was due to the fact that the Huntington 
method makes the “cri‘ical fraction” separating large remainders for 
which seats are apporti»ned from small ones for which they are not a 
variable one lying always below .500 and above .414. The following 
computation shows why I preferred the results of the Webster method. 

States Deviation from Population Standard by 
Webster method Huntington method 


3 large states. +1.15 —1.85 
3 small states —1.41 +1.59 


Total 2.56 3.44 


The total deviation from the standard is one third greater by the 
Huntington method than by the Webster method. 


7 61st Congress, 3d Session, House Report No. 1911, Jan. 13, 1911, p. 9 f. 
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The two methods applied to 1930 figures yielded identical results. 
Ten years later Congress abandoned the Webster method and adopted 
the Huntington one largely because it would give to Arkansas a seat 
which the Webster method would have given to Michigan. After the 
1950 census the Huntington method gave to Kansas a seat which the 
Webster method would have given to California. Analysis of these 
instances shows how the two tests worked. 

Professor Huntington described his test in these words: “Test of 
equal proportions.—A transfer of a seat from one state to another 
should be made if, and only if, the percentage difference between the 
congressional districts’ in the two states would be reduced by the 
transfer.”® 

This leads to the following figures (in thousands) 

Before transfer After transfer 


State No. of District No. of District 
seats Population seats Population 


California 31 353 30 341 
Kansas 5 381 6 318 


Disparity 8.0 per cent 7.5 per cent 


Because 7.5 per cent was less than 8.0 per cent, the Huntington test 
gave the seat to Kansas. I cannot see how that test bears on the prob- 


lem it tries to solve. 

If we compute the number of seats and fractions of a seat to which 
the two states were entitled in 1950 California should have had 30.68 
and Kasas 5.52. The Huntington method in giving California 30 seats 
curtailed its representation by .68 and in giving Kansas 6 seats swelled 
it by .48, a total departure from the standard of 1.16. The Webster 
method in giving California 31 seats would have swollen its representa- 
tion by .32 and in giving Kansas 5 would have curtailed it by .52, a 
total departure from the standard of .84, about three fourths as much. 

Another form of comparing the methods may be clearer. Each 
method gave the two states together 36 representatives. The question 
is, should California have received 31 and Kansas 5, or California 30 
and Kansas 6? Since California had 84.75 per cent of the population of 
the two states it should have that per cent of the 36 seats, in other 
words 30.51 seats for it and 5.49 for Kansas; so California had the 
stronger claim to the transferable seat. 

Cumulative evidence comes from applying each test to the 1940 fig- 





8 He meant by congressional districts what I prefer to call district populations. 
*E. V. Huntington, “Methods of Apportionment in Congress,” in 76th Congress 3d Session 
Senate Document No. 304, p. 3. 
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ures for Michigan and Arkansas. Each method gave those states in 
combination 24 seats; the question was, should Michigan receive 17 
and Arkansas 7, or Michigan 18 and Arkansas 6? Michigan was en- 
titled by its population to 17.43 seats and Arkansas to 6.04 so Michigan 
with 17 was curtailed by .43 and Arkansas with 7 was strengthened 
by .96, a total deviation of 1.39. But if Michigan had received 18 seats 
and Arkansas 6 the total deviation would have been only .61, (.57 
and .04) less than half as much. Michigan had 72.95 and Arkansas 
27.05 per cent of the population of the two states, so the former was 
entitled to 17.51 and the latter to 6.49 out of the 24 seats. Evidently 
Michigan should have been given the transferable seat. 

The argument thus far has indicated that Congress erred when it 
adopted the Huntington method and abandoned that of Webster. 

We come to the last question, What effect has the requirement, 
“Each State shall have at least one Representative” on the problem? 
That question but not its answer I saw through a glass darkly when I 
wrote, “The Vinton method ... involves a fundamental theoretical 
error.!° It overlooks the crucial fact that seats in the House of Repre- 
sentatives are of two classes, the 48, one for each state, which are 
guaranteed by the Constitution and are as completely beyond the con- 
trol of Congress as the seats of the Senators are, and the remainder, 
the number and distribution of which are under congressional control. 
The two classes might be named the apportionable and the unappor- 
tionable seats. The fact that they are not individually distinguishable 
has apparently been responsible for the failure to recognize their exist- 
ence. To get this theoretical requirement clearly in mind it may be 
helpful to think of the seats in the House of Representatives as num- 
bered. The first 48 seats, one for each state, would be numbered one 
to indicate that there is no basis for distinguishing between them. The 
next seat, numbered 49, would be apportioned to New York, number 
50 to Pennsylvania and so on”, 

This distinction has now been recognized by all authorities and ap- 
portionment tables give the states to which seats go in succession from 
No. 49 on to No. 435. Both the present and the proposed method under 
1950 figures would give seats 49, 50, 51, and 52 to the largest states, 
New York, California, Pennsylvania and Illinois, in the order of their 
size but differ about seat 53. The present; method gave that seat to 
New York with a district population (in thousands) of 7,415 although 


10 Later I realized that the Webster method in either form and the Huntington method involve the 
same error. 

1 Walter F. Willcox, “The Apportionment of Representatives; Annual Address of the President,” 
American Economic Review, 6 (1916) Supplement, pp. 3-16. 
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Ohio, the fifth state in size, had a larger district population, 7,947, and 
in my judgment should have received the seat. If Congress should 
agree with me on that point, its choice would entail a decision that the 
method of included fractions leads to better results than the method 
of equal proportions and should displace it at the first opportunity. 

In ending the argument about methods of apportionment we may 
summarize the conclusions. 

Methods are of two kinds, primary and secondary, the former finding 
a root in the Constitution, the latter not finding such a root. 

There are three primary methods, one based on “not more than one 
in every thirty thousand,” another based on “according to their re- 
spective numbers,” and a third based on the attempt to combine the 
second requirement with the last, “Each State shall receive at least one 
Representative.” 

The results of these three methods differ in that the first apportions 
no supplementary seats for remainders, the second apportions such 
seats for remainders larger than one half, the third like the first appor- 
tions no seats for remainders but unlike it assigns one seat to each 
state before apportionment begins. 

The results also differ in their distribution of transferable seats, the 
number of which has increased from two after the first census to sixteen 
after the last. The first method distributes all transferable seats among 
the large states, the second divides them as evenly as possible between 
the large and the small states, the third distributes them all among 
the small states. 

The Constitution seems to leave with Congress a choice between 
two tests and two methods of apportionment. The first test would be 
based on the nearness of two proportions, on the one hand the propor- 
tion that a state’s population makes of the population of all the states, 
and on the other the proportion that the number of a state’s repre- 
sentatives makes of the whole number of representatives. The second 
test would be based on the nearness of the district populations of the 
48 states to one another. 

The number of methods of apportionment giving different results 
varies with the figures of a census but is always one more than the 
number of transferable seats. The seventeen results possible after the 
1950 census would come from three primary and fourteen secondary 
methods; these seventeen methods make up a series wherein the results 
of each would differ from those on either side of it by transferring one 
seat from a large state to a small one or vice versa. 

The methods reaching these results use a series of seventeen divisors 
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with 329,577 at one end and 365,394 at the other. They use also a 
series of seventeen critical fractions with zero at one end and one at the 
other.” 

The article may close with a few words about another amendment 
now before the House Judiciary Committee. It provides for a slight 
automatic reduction in the size of the House after each future census. 
The law now declares that the President shall report to Congress after 
each census the result of redistributing the then existing number of 
members, among the states according to the method last used by Con- 
gress. The amendment would insert the words, “ten less than” before 
the words, “the then existing number.” 

The House is now about four and a half times as large as the Senate; 
at the start it was only two and one half times as large. In state legis- 
latures the difference between the size of the two branches is much 
less, and, since as a rule the more recent a state constitution is, the 
nearer to equality in size are the upper and lower houses, it would seem 
that American experience has led to a reduction of the average differ- 
ence. 

More important evidence comes from members of the House. Six- 
teen years ago a Congressman acting at my suggestion asked each of his 
colleagues whether he was satisfied with the present size of the House 
and, if not, whether he wanted it larger or smaller. Among the one 
third who replied, one half were satisfied with the present size of the 
House; of the other half about nine tenths wanted it smaller. 

More evidence to the same effect came from the apportionment de- 
bate in the 1920’s; then fifteen experienced and influential representa- 
tives gave an opinion; all but two wanted the House smaller. Probably 
no one now a member can recall, as Congressman Burton then could, 
when it was only three quarters as large. He had begun his long service 
more than forty years before and had been for six years a Senator. He 
said: “I began when there were 325 members of this body, and the 
disadvantages in the transaction of business now as compared with 
then are beyond my powers to describe. It is not only the greater ex- 
pense but the greater confusion on the floor and the greater difficulty 
in the orderly transaction of work. .. . I would rather see this House 
consist of 300 members than 435.” 

With the size of the House stabilized as it has been for fifty years, 
1910-1960, the decennial increase in the average population of a con- 





12 “The Apportionment Problem and the Size of the House; A return to Webster,” Cornell Law 
Quarterly, vol. 35 (1950), pp. 367-89; and “Last Words on the Apportionment Problem,” Law and 
Contemporary Problems, vol. 17 (1952), pp. 290-301. 
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gressional district is about 44,000. If the size of the House had been 
reduced by ten members after the last census, that increase would 
have been about 49,000, a difference probably no member would worry 
about. 

If such an amendment should be adopted the business of the House 
might be done faster and better and debating the amendment, even if 
Congress took no action on it, would bring home to it the fact that it 
can now change the size of the House slowly up or down without being 
blocked as it often was in the past by a tiny pressure group. 





THE KINSEY REPORT ON FEMALES* 


Dorortuy 8S. Brapy 
Washington, D. C. 


HE first chapter of “Sexual Behavior in the Human Female” con- 

tains four pages of persuasion in three sections—“The Scientific 
Objective,” “The Right to Investigate” and “The Individual’s Right 
to Know.” The argument begins in the first section with three cautiously 
phrased sentences— 


ee eee ee ee ee | 


It should be clearly understood that the original goal of our study was 
the extension of our knowledge in an area in which scientifi: information 
appeared to be limited. In the course of the years it has become apparent 
that the data we have acquired may prove of value in the consideration of 
some of our social problems, but that was not why we originally began this 
research. 

It has been the history of science that any addition to our store of ade- 
quately established knowledge may ultimately contribute to man’s mastery 
of the material universe. (p. 7) 


progresses to the emotional level in the second section— 


The scientist who observes and describes the reality is attacked as an 
enemy of the faith, and his acceptance of human limitations in modifying 
that reality is condemned as scientific materialism. But we believe that an 
increased understanding of the biologic and psychologic and social factors 
which account for each type of sexual activity may contribute to an ulti- 
mate adjustment between man’s sexual nature and the needs of the total 
social organization. (p. 10) 


and ends in the third section with a dramatic defense of scientific free- 
dom and individual rights— 


... We believe that the scientist who obtains his right to investigate from 
the citizens at large, is under obligation to make his findings available to all 
who can utilize his data. Any scientist who fails to report o- to place his 
findings in channels where they may serve the maximum number of persons, 
fails to recognize the sources of his right to investigate and thereby jeopar- 
dizes the rights of all scientists to investigate in any field... . 

... We believe that if we have any right to investigate in this field, we are 
under obligation to make the results of our investigations available to all 
who can read and understand and utilize our data. (pp. 10, 11) 


The obligation to make the findings available to the maximum num- 
ber of persons was fulfilled with all the craft and skill of modern pub- 
licity. Yet the message conveyed to the public is not at all unequivocal. 





* A review article on Sexual Behavior in the Human Female, by Alfred C. Kinsey, Wardell B. Pom- 
eroy, Clyde E. Martin, and Paul H. Gebhard. Philadelphia and London: W. B. Saunders Company, 
1953. Pp. 842; $8.00. 
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As a sociologist put it, “Even the careful reader, trying to avoid selec- 
tive bias, is not always sure where these investigators stand. They fre- 
quently hint at adverse evaluations of our religious-moral traditions 
but draw back in the end, leaving evaluative issues for future study. 
This somewhat confused situation is providing a field day for moralists. 
Both outraged traditionalists, scanning these books for purple passages, 
and opponents of religion, eager to prove the validity of their pre- 
conceptions, are finding what they want to find.”! 

The issues surrounding the individual’s right to know the results of 
publicly sponsored investigations should be considered apart from the 
subject of the research. The social control of scientific research ulti- 
mately relies on professional codes of practice. Neither naiveté nor 
indifference but plain common sense on the part of the general public 
and their representatives in legal office or on boards of foundations 
delegates the responsibility for research to the scientists as a group. 
There are not many important subjects of research that can be pre- 
sented so that all men, women and children could pretend to “read, 
understand and utilize the data.” Medicine is replete with examples of 
research on subjects of such grave importance to the general public 
thet progress towards scientific findings makes the headlines of daily 
newspapers. Yet the sifting of contradictory evidence, the synthesis 
of generalizations, and the validity of applications are trustingly left 
to the medical profession. Kinsey’s interpretation of the code of the 
scientist, 

... to investigate honestly, to observe and to record without prejudice, 
to observe as adequately as human sense organs or the most modern instru- 
ments may allow, to observe persistently and sufficiently in order that there 


may be an ultimate understanding of the basic nature of the matter which 
is involved. (p. 10) 


stops short of the principle which safeguards the public interest in 
scientific research. 

Modern science and its technological applications grew ot of the 
concept of an experiment that can be repeated by different observers. 
Historians tell us that this idea in its time was revolutionary enough to 
create great public interest. The assurance that the results of an ex- 
periment will be repeated again and again is fundamental in the appli- 
cation of the results of scientific research. The results of an experiment 
are quantitative generalizations. The mere recording of observations 
has seldom led to a conflict between the observers and the legal or 





1 Claude C. Bowman, “Sociological Implications of the Kinsey Studies,” Temple University, paper 
presented to the Eastern Sociological Society, April 4, 1954. 
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religious authority. The conflicts have centered around the quantita- 
tive generalizations that constituted attacks on whole systems of ideas. 
The scientists with whom Kinsey compares himself—Kepler, Coperni- 
cus, Galileo, and Pascal—were not primarily observers; they generalized 
astronomical observations that had been accumulating for thousands 
of years. 

Astronomy in the nineteenth century met a single historical event, 
the problem of the observation, that was difficult, if not impossible to 
replicate. Towards the end of the century as the mass of scientific data 
and generalizations became large and complex and empirical methods 
were extended to the biological and social sciences, the concept of repli- 
cation had to be given a new operational definition equivalent to its 
literal meaning. 

Modern statistical procedures give the answer to a serious question. 
What can be done to assure the validity of the results of observations 
that, literally, cannot be repeated? It is no historical accident that Karl 
Pearson, one of the founders of modern statistics, tried to reformulate 
the nineteenth century concept of the repeated experiment in his 
Grammar of Science. It is no historical accident that his son, Egon 
Pearson, among others, brought statistical procedures to the testing of 
hypotheses. Statistical theory offers means for appraising the results 
of investigations that serve the function that the actual repetitions of 
the experiment generally served in the past and still serve in many 
provinces of scientific research. A profession recognizes the work of one 
of its members after scrutiny of all the operational phases of the inves- 
tigation. Statistical methods are involved in the appraisal at the pri- 
mary level of sampling and at the level of quantitative generalization. 

The specification of basic definitions and concepts is not within the 
province of statistical methods. It is the random or systematic vari- 
ability in observations of particular phenomena or in reporting on 
particular forms of behavior that the statistician is equipped to study. 
The formulation of the operating hypotheses on the basis of which 
experiments or surveys are designed is also not a function of the 
statistician. Statistical procedures implement a principle expressed 
nearly a century ago—“Wrong hypotheses, rightly worked, have pro- 
duced more useful results than unguided observations. ”? 

Kinsey’s haste to present his findings to all those who might want to 
apply them has forced into the popular press a process usually con- 
ducted quietly through technical journals and conferences. The public 
may be diverted by all the controversies over the validity and impor- 





2? DeMorgan, “A Budget of Paradoxes,” Open Court Edition, p. 87. 
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tance of the Kinsey findings but large groups of specialists in the theo- 
retical and applied fields concerned with the subject matter are under- 
standably bewildered. 

The use of statistical survey methods in the study of sex behavior 
did not originate with the Kinsey research. Kinsey’s volumes refer to 
earlier studies and Appendix B of the report of the American Statistical 
Association’s committee* summarizes important differences between 
methods used in the earlier surveys and in the Kinsey survey. The 
data from a survey in a related field, “Social and Psychological Factors 
Affecting Fertility,”* which was conducted in Indianapolis in 1941, and 
was also financed by a foundation, have been carefully analyzed in 
more than twenty technical articles by demographers and sociologists. 
Many of the original hypotheses formulated for testing with the data 
have been rejected. The final summaries present the results of all the 
statistical tests. The data from the Indianapolis survey and from the 
other surveys of sex behavior do not display the neat and systematic 
differences or similarities between population groups that Kinsey and 
his associates have discovered in their study of the human female. 

The Kinsey group places great emphasis on its interviewing tech- 
niques and approach to the respondents. The evaluation of these tech- 
niques in the report of the Committee of the Association and in Wallis’ 
review do not need to be summarized in this connection. All of the 
questions about the sample of males also relate to the sample of females 
which is, “...even more inadequate than our sample of males in 
representing lower educational levels, rural groups, and some of the 
other segments of the population” (p. 57). Yet the relationships shown 
in the analysis of female sex behavior are uniform and regular com- 
pared to the results from most large-scale surveys of much less variable 
types of human behavior. Other investigations, even with longer and 
much more intensive statistical analysis, do not produce such elegant 
statistical results. 

Here lies the paradox. What is the Kinsey secret that, from an in- 
adequate sample, produces results of a stability that others cannot 
duplicate? Even if the original reports by individuals interviewed were 
absolutely accurate the mathematical uniformities in the correlations 





* Appendix B of the report of a committee appointed in 1950 by S. S. Wilks as President of the 
American Statistical Association to review the statistical methods used by Alfred C. Kinsey, Wardell B. 
Pomeroy, and Clyde E. Martin in ‘‘Sexual Behavior in the Human Male”’ (Philadelphia, W. B. Saund- 
ers Co., 1948), to be published in a monograph by the American Statistical Association in 1954. 

4 Clyde V. Kiser and P. K. Whelpton, “Résumé of the Indianapolis Study of Social and Psycho- 
logical Factors Affecting Fertility,” Population Studies, Vol. 7, No. 2, November 1953, Great Britain. 

5 W. Allen Wallis, “Statistics of the Kinsey Report,” Journal of the American Statistical Associa- 
tion, 44 (1949), pp. 463-84. 
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would be surprising. When there is a significant difference in sex be- 
havior between groups differentiated by some social characteristic tliere 
are few cases of inversions in the rankings. When there is no correlation 
the absence of a significant relation can scarcely be questioned. 

Most social surveys offer a multiplicity of hypotheses about the con- 
nection between a particular form of human behavior and the demo- 
graphic, social, and economic characteristics of various population 
groups. The choice of the hypotheses to be tested, evaluated, and pre- 
sented to the general public more and more has to be made by the 
investigating agency with such advice from specialists in the subject 
matter and in statistical methods as it can marshall on the problem of 
selection. It is doubtful whether the best hypotheses in a social survey 
would ever yield as many unambiguous comparisons as the Kinsey 
study of the human female. Certainly those hypotheses—and they are 
not necessarily the best—that lead to published data do not reach the 
Kinsey standard. 

The particular detail in the Kinsey tables that does not conform to 
experience with other empirical data is the virtual absence of zero fre- 
quencies in the incidence tables that are published. When frequencies 
of a particular form of behavior run low—under 15 per cent of the 
population group—other empirical studies often show a zero frequency 
in the sample reports for subgroups of the population even when the 
number reporting in that subgroup is substantial—say between 50 and 
500. The statistical reason for a wide variation in the percentages and 
averages in samples so large is to be sought in the theory of multivariate 
distributions. Most social and economic characteristics of the popula- 
tion are highly intercorrelated. The accidents of sampling may result 
in a subgroup defined by one or two factors that differs widely from the 
total population represented by that subgroup with respect to other 
factors. 

The uniformities of the Kinsey correlations and the acknowledged 
variability in sexual behavior lead to only one conclusion. Most of the 
Kinsey findings must be based on a few, relatively incontrovertible 
relations. The apparent precision of his results must be based on some- 
thing that is effectively a tautology. 

When the factors associated with a particular form of behavior are 
correlated with each other, a single simple and perhaps logical relation 
can be reproduced indefinitely through one-factor correlations that 
appear to be highly significant. The Kinsey report on the human female 
finds a systematic inverse connection between sexual behavior of fe- 
males and religious activity and a lack of relation with educational 
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attainments which were found significant in the study of males. Both 
of these results may be merely reflections of a correlation of age and 
marital status with religion and education among the women inter- 
viewed by Kinsey and his associates. 

The report on the human female provides in the first chapter some 
information on the basic sample not offered in the volume on the hu- 
man male. With the information on the distribution of the sample by 
religious groups and age at the time of interview and the cumulated 
tables in the rest of the volume, it is possible to trace some of the inter- 
correlation of the religious classification and age and—less completely— 
the connection between religious activity and marital status. Of the 
devout females, Protestant, Catholic, or Jewish, relatively more were 
under 25 years of age than of the moderately active or inactive. The 
proportion of devout Protestant females in the age bracket 16-20, was 
32 per cent and approximately twice the percentage of inactive Protes- 
tants of the same age. Among devout Catholic and Jewish females there 
were 33 and 52 per cent in that age bracket; among inactive 19 and 39 
per cent. The difference in the age distributions among the groups 
classified by the degree of religious adherence means that more older 
females were reporting on their activities in the younger ages among 
those inactive religiously than among the devout. The average age of 
those reporting on activity at a given age differs among the groups 
classified by religious background much as their reported sexual activity 
differs. 

The cumulated tables reveal that there were more females married 
in the younger age groups among those who were inactive than among 
those who were moderate or devout in their religious activities. With 
some difficulty it is possible to read another difference among the re- 
ligious groups from the tables presented in this volume. Relatively 
more previously married females appeared among the inactive than 
among the moderate or devout religious groups. 

All of the relations between religious affiliation and sexual behavior 
may be a reflection of more fundamental association between age, 
marital status, and the frequencies of various forms of sexual experi- 
ence reflected in the Kinsey samples. The connection between sexual 
experience and marital status in the case of females can hardly be 
debated and Kinsey’s results can easily be confirmed by the man on 
the street who has read the Bible and observed the social practices in 
his own time. The connection with age probably can not be explained 
as a cultural tautology. 

The report on the human female shows changes in the female be- 
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havior occurring in this century in the nineteen-twenties that have 
persisted until the present time. The relation between sexual behavior 
and decade of birth, and by inference to age, brings back that very 
inconvenient problem of the statistical sample. The Kinsey sample of 
females was constituted mainly of women who had been to college, at 
least for a few years, and whose parents had the capacity to finance 
girls as well as boys through college. The girls who went to college 
prior to World War I must have differed more from the general popula- 
tion of females than those whose parents were able to support this par- 
ticular luxury during the twenties. The Kinsey relation between decade 
of birth and female sexual behavior may be only a reflection of the more 
representative selection of the total female population that entered 
colleges and universities after the close of World War I. The popular 
literature of that decade stressed the appearance of the girl in search 
of a husband on the college campus. If, prior to the war, the woman 
who went to college was selected by her own drive towards a profes- 
sional activity or attainment, her behavior in other respects may have 
been very atypical. 

The Kinsey report on the female comes to a sweeping conclusion 
about the educated women still single in their older years: 

When such frustrated or sexually unresponsive, unmarried females at- 
tempt to direct the behavior of other persons, they may do considerable 
damage. There were grade school, high school, and college teachers among 
these unresponsive or unresponding females. Some of them had been di- 
rectors of organizations for youth, some of them had been directors of insti- 
tutions for girls or older women, many of them had been active in women’s 
clubs and service organizations, and not a few of them had had a part in 
establishing public policies. Some of them had been responsible for some of 
the more extreme sex laws which state legislatures had passed. Not a few of 
them were active in religious work, directing the sexual education and trying 
to direct the sexual behavior of other persons. Some of them were medically 
trained, but as physicians they were still shocked to learn of the sexual ac- 
tivities of even their average patients. If it were realized that something 
between a third and a half of the unmarried females over twenty years of 
age have never had a completed sexual experience, parents and particularly 


the males in the population might debate the wisdom of making such women 
responsible for the guidance of youth. (p. 526) 


In all there were 299 unmarried females 36 years of age or older and of 
these, 112 were 46 or older. 

With so many of them teachers, directors of institutions, and medi- 
cally trained as the text of this report suggests, it is possible to reach 
only one conclusion. These were the women of an earlier era dedicated 
to a professional life and—perhaps of more importance in the present 
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context—were among those who “covered up” their reports on experi- 
ence. The struggle of women for professional recognition in many fields 
is over. The particular type of determined individual who entered pro- 
fessional training before 1918 and may have been represented by the 
few that came into Kinsey’s sample can not be considered representa- 
tive of women of the same age in the years to come. 

Whether the relation between age, decade of birth, and sexual be- 
havior relates only to Kinsey’s sample or to a more general group of 
the female population, differences in age and marital status explain 
most of the correlations found between sexual behavior and religious 
activity. In other words, it appears that sexual activity explains re- 
ligious inactivity. The smoothness of the Kinsey correlations relating 
sexual activity inversely to religious activity is apparently the regu- 
larity imposed by dividing a range of a variable into three parts, low, 
medium, and high. 

Had the authors standardized their comparisons for age and marital 
status, and had marital status been defined in four groups to include 
the status of “being engaged,” all of the generalizations in this volume 
about the connection between sexual behavior and social factors might 
have been changed. 

Kinsey and his associates apparently started with a fairly simple 
hypothesis about the sexual behavior of human females—not too well 
formalized to be sure—that it has an anatomical and physiological 
similarity with the males, but is inhibited by social, legal, and other 
cultural codes and practices. 

Through the series of tables describing the sexual activities of the 
religious groups of females there is a causal chain that deflected Kinsey 
and his associates from their initial and relatively simple operating 
hypothesis. The greater sexual activity of the religiously inactive fe- 
males appears to lead inevitably to a higher proportion of broken mar- 
riages. The Kinsey inference that greater sexual experience for the 
female before marriage leads to quicker and more satisfactory sexual 
marital relations is contradicted by two correlations in his survey. 
Pre-marital hetero-sexual experience is correlated with extra-marital 
experience. The groups that have the highest incidence of extra-marital 
experience include the greatest number of marriages broken by sepa- 
ration and divorce. 

The authors contend with these inferences from their study in vari- 
ous oblique ways. Early in the text they state: “The generalizations 
throughout the present volume have therefore been restricted to the 
particular sample that we have had available.” Progressively to the 
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final chapters they throw away this restriction. The book would make 


an excellent subject for textual analysis by the techniques of historians 
and Biblical scholars. 


There are some who have feared that a scientific approach to the prob- 
lems of sex might threaten the existence of the marital institution. There 
are some who advocate the perpetuation of our igncrance because they fear 
that science will undermine the mystical concepts that they have substituted 
for reality. But there appear to be more persons who believe that an exten- 
sion of our knowledge may contribute to the establishment of better mar- 
riages. (p. 13) 

There are legal and social responsibilities in any marriage; there are 
economic problems to be solved; above all, there are psychologic adjust- 
ments to be made between the wedded partners. Sexual adjustments repre- 
sent only one aspect and not necessarily the most important aspect of mar- 
riage. No balanced program for American youth can be confined to prepar- 
ing them for sexual relationships in marriage. But it is inconceivable that 
anyone who is objectively and scientifically interested in successful mar- 
riages should fail to appreciate the significance of coitus in marriage, or 
wholly ignore the correlations which exist between pre-marital activities and 
the sexual adjustments which are made in marriage. (p. 391) 

These correlations between pre-marital and extra-marital experience may 
have depended in part upon a selective factor: the females who were inclined 
to accept coitus before marriage may have been the ones who were more 
inclined to accept non-marital coitus after marriage. A causal relationship 
may also have been involved, for it is not impossible that non-marital coital 
experience before marriage had persuaded those females that non-marital 
coitus might be acceptable after marriage. (pp. 427-428) 

Extra-marital coitus had figured as a factor in the divorces of a fair 
number of the females and males in our histories. We have data on 907 indi- 
viduals (female and male) who had had extra-marital experience and whose 
marriages had been terminated by divorce. We have the subjects’ judgments 
of the significance of their extra-marital coitus in 415 cases. In nearly two- 
thirds (61 per cent) of these cases, the subject did not believe that his or her 
own extra-marital activity had been any factor in leading to that divorce. 
... It is to be noted, however, that these were the subjects’ own estimates 
of the significance and, as clinicians well know, it is not unlikely that the 
extra-marital experience had contributed to the divorces in more ways and 
to a greater extent than the subjects themselves realized. (p. 435) 

These data once again emphasize the fact that the reconciliation of the 
married individual’s desire for coitus with a variety of sexual partners, and 
the maintenance of a stable marriage, presents a problem which has not been 
satisfactorily resolved in our culture. It is not likely to be resolved until man 
moves more completely away from his mammalian ancestry. (p. 436) 

The failure to recognize these differences in the needs of the two sexes for 
a regular sexual outlet may be the source of a considerable amount of diffi- 
culty in marriage. It is the source of many social disturbances over questions 
of sex. In establishing sex laws, in considering the sexual needs of females 
and males in penal and other institutions, in considering the need among 
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females and among males for non-marital sources of sexual outlet, and in 
various other social problems, we cannot reach final solutions unless we 
comprehend these considerable differences between the sexual needs of the 
average female and the average male. (p. 682) 

The possibility of reconciling the different sexual interests and capacities 
of females and males, the possibility of working out sexual adjustments in 
marriage, and the possibility of adjusting social concepts to allow for these 
differences between females and males, will depend upon our willingness to 
accept the realities which the available data seem to indicate. (pp. 688-689) 


The reality the authors found, perhaps reluctantly, in the last chap- 
ter, namely, a fundamental difference between males and females, is 
supported by information from the survey that shows “the male’s 
greater inclination to be promiscuous” (p. 683). The table shows the 
number of partners reported by males and females in pre-marital pet- 
ting and pre-marital coitus. The difference in average numbers of 
partners in pre-marital coitus reported by females and by males is so 
great that, in view of the fact that the numbers of males and females 
in the population are nearly equal, only one conclusion can be reached: 
that the samples of males and females came from different populations. 
This possibility is admitted in the first chapter in a discussion of the 
sample of females. The discrepancy in the data on this subject, sum- 
marized differently, is explained on page 79 in fine print. Reasons 
given, among others, are differences in the distribution of the samples 
by educational attainment, the omission of prostitutes from this re- 
port, the fact that some of the men reported on experience abroad while 
in the armed forces, and the possibility that “the females may have 
covered up in reporting their pre-marital experience, or the males may 
have exaggerated their reports of such experience.” 

The inferences in the last chapters represent the confusion of investi- 
gators not equipped with technical tools, without much experience in 
the analysis of multivariate relations. At some point in the analysis 
of their data for females, the authors were forced to reject their original 
hypothesis. Without a specific formulation of the quantitative relations 
being tested, they may not have recognized what happened. The im- 
portance of social and psychological factors as developed mainly in the 
final chapters and references in earlier chapters could easily have been 
interlarded. 

The book leaves the very strong impression that even Kinsey and 
his associates would not replicate the generalizations in this volume. 
The contradictions in their inferences will serve them well. Several 
other systems of generalizations can be shown to be consistent with 
some of the pronouncements in this volume. 





UNSOLVED PROBLEMS OF EXPERIMENTAL STATISTICS* 


Joun W. TUKEY 
Princeton University 


T WOULD not be misleading to suggest that there is really only one 
i] unsolved problem of experimental statistics: “How can we recognize 
the problems of experimental statistics?” We can recognize a good 
many unsolved problems by accident, but we probably miss many im- 
portant ones for far too many years. Difficulties in identifying prob- 
lems have delayed statistics far more than difficulties in solving prob- 
lems. This seems likely to be the case in the future, too. 

Thus it is appropriate to be as systematic as we can about unsolved 
problems. Any system may be a start toward, or even a partial solution 
of, this problem of recognition. I shall try to do this by stating first 
some principles and then some consequences. I shall strive to phrase 
all these principles as generally as possible, in the hope of prolonging 
their useful life. 

A discussion of examples of these 18 general principles will set forth 
a certain number of unsolved problems, while a list of 51 provocative 
questions poses many more. (This list is admittedly and intentionally 
incomplete.) The account closes with a discussion of the possibility of 
orienting experimental statistics toward problems rather than tech- 
niques. 

SOME GENERAL PRINCIPLES 


If we feel that the detailed problems of experimental statistics arise 
from the interaction of certain general principles among themselves and 
with classes of experiments, it is reasonable to try to state and illustrate 
some of these principles. Before stating the hypergeneral principles on 
which these general principles hang, we need to explain the sense in 
which three terms, ends, areas and constderations will be used there and 
in the sequel. 

By an end we refer to real purposes of the user of the statistical tech- 
nique. These purposes are often unformulated, and their partial formu- 
lation often requires the statistician to “psychoanalyze” his client (in 
the writer’s view this is one of the most important functions of the 
statistical consultant!). An immediate end is a formalized (and almost 
certainly partial) end such as to describe an appearance (e.g., by a 
point estimate), to make a test of significance, to make a decision, or 
to reach a confidence statement. 





* Prepared in connection with research sponsored by the Office of Naval Research. Presented to 
the American Statistical Association and the Biometric Society 28 December 1953. 
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An area is a class of situations with qualitatively similar data, such, 
for example, as the class where two sets of observations are presented 
for the comparison of the “typical” values of the corresponding popu- 
lations (means, medians, and the like serve as “typical” values). Within 
an area, different techniques are competitive. Within an area, the his- 
torical, evolutionary, and logical relations of different techniques are 
relatively clear. 

A consideration is a recognition that the world may very well be more 
complex, annoying, and difficult than our earlier techniques had sup- 
posed. Thus we might admit—nay, even take into consideration—the 
possibility that we did not know the variance, that the distribution 
might not be normal, that a certain fraction of the observations are 
affected by blunders, etc. 

The four hypergeneral principles, which may seem harmless until 
we come to their consequences, run as follows: 


(A) Different ends require different means and different logical 
structures. 

(B) In each area, statistical method must and does evolve, mainly 
by adding both immediate ends and considerations. 

(C) While techniques are important in experimental statistics, 
knowing when to use them and why to use them are more 
important. 

(D) In the long run, it does not pay a statistician to fool either him- 
self or his clients. 


We have one hypergeneral principle about logical structure, two about 
statistical method, and one about statisticians. The last may seem to 
be of smallest scope, but when we consider matters carefully, we see 
that (A), (B), and (C) all follow from (D). To insist on one means or 
one logical structure for different ends, or to feel that there is a solution 
to the problems of method, are obvious attempts of the statistician 
to fool himself. 

Clearly, one very general consequence is this: “This complexity of 
experimental statistics will clearly increase.” 

Reducing the generality somewhat, we list some consequences of 
(A), (B), (C), and (D) which are themselves general principles: 


(A1) Statistics needs constantly to recognize new ends for which it 
should try to furnish new means and new logical structures. 

(A2) Statistics needs to avoid over-unification, while encouraging 
coordination. 
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(AS) Statistical methods should be tailored to the real needs of the 


user. 


(A4) Statistics needs continually to compare its own logical struc- 


(B1) 


(B2) 
(B3) 
(B4) 


(Bé) 


(C1) 


(C2) 


(C3) 


(C4) 


(D1) 
(D2) 


(D3) 


(D4) 


tures with the logical structures currently used or being put 
into use by science, engineering, business, and military ad- 
ministration, and other fields. 

In any area of statistical method, analysis cannot be usefully 
considered alone for more than a limited time; after a time 
appropriate to the area, design must be brought in. 

There are normal sequences (patterns) of growth in immediate 
ends. 

There are normal sequences (patterns) of growth in considera- 
tions. 

Growth in immediate ends can sometimes be neglected, but 
growth in considerations is almost never to be neglected. 

At any one time, different areas of statistical methodology will 
be in different states of evolution, both in immediate ends and 
in considerations. 

Competitive statistical techniques indicate a need for manuals 
of “when to choose which” and not just selection of “the best” 
technique. 

Statisticians owe their clients help in choosing wisely between 
high confidence in a short inference and low confidence in a long 
inference. 

Techniques of evaluating both the isolated experiment and 
history down to date will continue to be useful. 

“What should be done” is almost always more important than 
“what can be done exactly.” Hence new developments in ex- 
perimental statistics are more likely to come in the form of 
approximate methods than in the form of exact ones. 
Statisticians must face up to the existence and varying im- 
portance of systematic errors. 

Statisticians have ar obligation to clarify the foundations of 
their techniques for their clients. 

Statisticians should be honest and expository about the relation 
of precise “assumptions” and exactly “optimum” solutions to 
real situations. 

In every statistical area, we almost certainly need methods 
admitting one more nuisance parameter, methods of one higher 
level of robustness and de-parametrization, methods with 
both of these desiderata. 
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(D4) Statistics must continually study the behavior of its techniques 
when their conventional assumptions are not true. 


ILLUSTRATIVE EXAMPLES 


I will try to illustrate these principles by discussing particular prob- 
lems of experimental statistics which show their impact. These exam- 
ples are not intended to be an exhaustive list. In the light of general 
principle (C), a problem in experimental statistics is not solved by the 
existence of a mathematical statistical paper showing how to find a 
solution, or even by the existence of a technique with tables. There is 
needed an understanding of when and why to use the technique, and 
this understanding must be spread through a certain minimum number, 
sometimes small and sometimes large, of experimental statisticians. 
Thus we may, and should, discuss as unsolved problems some which 
others may consider as already solved. 

(A1) Statistics needs constantly to recognize new ends for which tt 
should try to furnish new means and new logical structures. A very good 
illustration of this principle is provided by recent developments in con- 
nection with the problem of multiple comparisons. Where one immedi- 
ate end grew a few years ago, three immediate ends flourish today and 
promise to flourish for a long time. These three are: 


(1) The immediate end of providing increments to the store of 
established knowledge. This to be done by the analysis of existent 
data with control of the error rate. The analysis to be formulated 
in confidence or significance statements (cf. Tukey [35, 36, 37], 
Duncan [11, 12, 13] and others). 

(2) The immediate end of providing protection against too bad a 
selection among candidates. This to be done by a sequential de- 
sign of measurement. The result to be selection of the appar- 
ently leading candidate when the “stop rule” takes effect. (cf. 
Bechhofer, Dunnett, Sobel [1, 2, 14]). 

(3) The immediate end of minimizing, in some sense, the sum of the 
costs of experimentation and the costs of poor choice. This is to 
be done by a sequential design of measurement. The result to be 
selection of the apparently leading candidate when the “stop 
rule” takes effect (cf. Grundy, Healy, and Yates [40, 41], 
Sommerville [31)). 


In my judgment, there will be a continuing place for all three immediate 
ends. To a reasonable extent these places correspond to the terms 
“basic research,” “developmental research,” and “operations research,” 
[cp. 22]. 
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This problem of multiple comparisons is still unsolved as a problem 
of multiple comparisons, because the necessary minimum numbers of 
experimental statisticians have not yet acquired a working understand- 
ing of the new immediate ends involved, or of when which technique 
is appropriate. Analogous problems, involving immediate ends which 
differ in analogous ways, are to be expected in more areas of statistics. 

(A2) Statistics needs to avoid over-unification, while encouraging co- 
ordination. It is now known to mathematical statisticians that all the 
currently routine modes of statistical technique—significance state- 
ments, point estimates, confidence statements, etc.—can be formulated 
as decision problems. There is a tendency in the air to do so to an in- 
creasing degree. This may be good mathematical statistics, because it 
may encourage the interchange of useful mathematical techniques 
among the modes. (We are likely to see in due course whether or not 
this is true.) But it would surely be very bad experimental statistics to 
treat all these modes in too unified a way. For then some experimental 
statisticians might be led to forget whether their clients wanted (ex- 
plicitly or implicitly) a decision or a confidence statement, whether 
they had done the experiment as a basis for immediate action or as a 
contribution to knowledge. What more important matter could be for- 
gotten by any experimental statistician? 

In almost every area of experimental statistics, there is a probiem of 
providing enough different methods to meet the user’s needs. 

(A8) Statistical methods should be tatlored to the real needs of the user. 
In a number of cases, statisticians have led themselves astray by choos- 
ing a problem which they could solve exactly but which was far from 
the needs of their clients. They could have chosen a problem closer to 
their client’s needs at the price of an approximate solution. In most of 
these cases, tailoring the statistical method to the real needs of the 
client would have meant, and still means, giving up exactness for the 
sake of usefulness. Realistic assessment of value must urge us to make 
such “deals” freely and frequently. 

The broadest class of such cases comes from the choice of significance 
procedures rather than confidence procedures. It is often much easier 
to be “exact” about significance procedures than about confidence pro- 
cedures. By considering only the most null “null hypothesis” many in- 
convenient possibilities can be avoided. If the varieties are not different 
they cannot interact with fertilizers or blocks. If the treatment has no 
effect, we do not have to be concerned with how its effect varies with 
the weight or health of the animal or child. And so on—and on. In these 
examples, it will be clear to many that we are dodging substantial issues. 
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But throughout experimental statistics there are many areas with sig- 
nificance procedures but without confidence procedures. Almost every 
one of these areas needs one or more rough confidence procedures. 
Rough procedures will be adequate because the assumptions are not 
likely to be closely true, so that the probability statements need not 
follow precisely from the assumptions either. One or more, because 
techniques based on alternative assumptions give both greater freedom 
of action and greater confidence in results to the analytical statistician. 
Here are many unsolved problems in experimental statistics! 

At another level of unsolution are the problems where the approxi- 
mate mathematical statistics has been done, but no use has been made 
of the results. One outstanding example is the computation by Haldane 
[19] of the effect of non-normality on the variance of the estimated cor- 
relation coefficient. Who has put this to use? Yet it surely is enough to 
support an empirical robustification procedure involving an effective 
number of pairs of observations. There must be many more examples 
like this, where the results have not been carried through to practical 
usability. 

(A4) Statistics needs continually to compare its own logical structures 
with the logical structures currently used or being put into use by science, 
engineering, business, and military administration, and other fields. 
We can indicate an unsolved problem here which is not likely to be 
solved in the near future. This is the problem of formalizing some 
further part of the process of developing new scientific concepts and 
new scientific theories. Only the most elementary steps in this process 
have been formalized (in terms of the analysis of conventional types 
of experiments, of the testing of goodness of fit, and the like). Undoubt- 
edly some, at least, of the less elementary steps can be formalized, but 
how? And which ones? 

This is a vague and diffuse problem, but it is a very important prob- 
lem indeed. Some would construe it as a problem for philosophers, but 
I feel that it will require quantitative philosophers (that is, experimental 
statisticians). 

(B1) In any area of statistical method, analysis cannot be usefully 
considered alone for more than a limited time; after a time appropriate 
to the area, design must be brought in. The second and third types of 
multiple comparison procedures cited above (A1) furnish an excellent 
example of the need for design. For the immediate ends involved the 
only action, once the measurements are made, is to take the seemingly 
best candidate. That this is reasonable is, and has been, clear to all. 
Even a very moderate degree of sophistication was barred from these 
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situations until the question of when to stop taking measurements was 
introduced. There must now be many similar cases in other areas today 
where design considerations have not yet been properly introduced. 

(B2) There are normal sequences of growth in immediate ends. One 
natural sequence of immediate ends follows the sequence: 


(1) Description 

(2) Significance statements 
(3) Estimation 

(4) Confidence statement 
(5) Evaluation 


In the case of a double binomial the successive levels are illustrative by 
the sequence of statements. 


(1) The percentage of success observed among A’s was higher than 
among B’s. 

(2) The percentage of success among A’s was significantly greater 
than among B’s. 

(3) The observed percentage of success among A’s exceeded that 
among B’s by a difference of 0.28 in logits. (Or, perhaps, by 15 
per cent.) 

(4) The difference in logits corresponding to the increased percentage 
of success in A’s as against B’s is between 0.18 and 0.43 with 
95 per cent confidence. (Between 10 per cent and 22 per cent 
with 95 per cent confidence, perhaps.) 

(6) Considering both this experiment, and all the observations re- 
ported by Smith, Jones, Brown, Robinson, and their coworkers, 
the indicated difference in logits lies between 0.32 and 0.36 with 
5 per cent diffidence (the difference in per cent lies between 17 
and 19, perhaps). 


The order of (2) and (3) is not nearly so weli defined as that of any other 
pair. In some areas, and to some experimental statisticians either order 
would be wrong. We have chosen this order for definiteness and not 
with sureness. 

In the actual case of the double binomial, almost every experimental 
statistician can handle (1), (2), and (3) easily. Some are not perturbed 
by (4) and of these most but not all can handle (4) correctly. No one, so 
far as the writer knows can treat (5) adequately. In other areas we may 
stop at level (1), at level (2), at level (3), or at level (4), but in almost 
every case there is a next level which represents an unsolved problem. 

How to operate at level (6) seems to represent an unsolved problem 
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in many areas. It is a real and important problem, and one whose solu- 
tion should not be approached flippantly or lightly. Either the classical 
example of the charge on the electron (as of 1938) or the current exam- 
ple of the heat of sublimation of carbon (which has not improved during 
the last 25 years) shows that the proper evaluatory answer may be: 
“The available determinations fall into two systematically different 
groups, which correspond to values between A and B and between C 
and D, respectively, and which we are confident cannot be brought into 
agreement without the introduction of a new systematic adjustment.” 
How many other unusual (from the point of view of formal statistics 
as found in the books) kinds of conclusions are reasonable in evaluation 
of all available data? This is not an easy question, but its solution (at 
least its partial solution) is a prerequisite to that of any problem of 
evaluation. 

There are, of course, other normal sequences of immediate ends, 
leading mainly through various decision procedures, which are appro- 
priate to development research and to operations research, just as the 
sequence we have just discussed is appropriate to basic research. (Here 
“There are, of course” means “There must be! We are sure they exist, 
but we cannot specify them today.”) 

(B3) There are normal sequences of growth in considerations. The area 
of comparing the typical values of two populations with aid of a sample 
drawn from each illustrates a customary sequence of evolution in con- 
siderations quite nicely. The sequence runs: 


(1) Normal populations of equal and known variance. 
(2) Normal populations of general (i.e., probably unequal) and 
known variances. 
(3) Normal populations of identical but unknown (but estimated) 
variance. 
(4) Normal populations of general and unknown (but estimated) 
variances. 
(5) Symmetrical populations of unknown shape and unknown but 
equal variance. 
(6) Symmetrical populations of the same unknown shape but gen- 
eral and unknown variances. 
(7) Symmetrical populations of unknown shapes and variances. 
(8) Populations of unknown but equal shape and variance. 
(9) Populations of the same unknown shape and unknown and 
general variances. 
(10) Populations of general and unknown shapes and variances. 
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Here we have exemplified the growth in considerations like these: 


(a) The scale of the populations might be different. 

(b) The variance might not be known. 

(c) The symmetrical populations might not be normal. 
(d) The populations might not have the same shape. 
(e) The populations might not be symmetrical. 


It is by considering such unpleasant possibilities that we sharpen our 
techniques and strengthen our understanding. 

The normal distribution suffices for levels (1) and (2), while level (3) 
requires Student’s ¢. The next level, (4), provides the Fisher-Behrens 
problem, while (5) seems to be the likely end of the direct application 
of Wilcoxon-Walsh [38-39] procedures (so far only applied to the 
matched observation case). Beyond this point the terra is rather in- 
cognita, but we may note that through level (7) we need to make no 
distinction between medians and means, while simple rank order pro- 
cedures are exact through level (8). 

Not only does this area—and remember that it is one of the most 
carefully worked over of all areas—provide a good example of a normal 
sequence of growth in considerations, but it also provides many exam- 
ples of unsolved problems. The Fisher-Behrens problem arises quite 
early, at only level (4) in the list, yet today the Fisher solution is 
known not to be unique [33], even in the domain of fiducial probability, 
while the Aspin-Welch solution may or may not correspond to an exact 
solution as well as an asymptotic one. What should a poor experimental 
statistician do? - 

Who has good-looking solutions for the problems posed by (4), (7), 
(9), or (10)? Who knows how the solutions for level (4) just mentioned 
behave as to error rate when (5), (6), or (7) represents the facts? How 
do the solutions for level (4) behave as to power when either (4) or (3) 
represents the facts? And the reader can add many more. 

The foreseeable, normal growth in considerations will provide un- 
solved problems for a long time to come in almost every area of statistics. 

(B4) Growth in immediate ends can sometmes be neglected, but growth 
in considerations is almost never to be neglected. We can use the two- 
sample area to illustrate this principle also. If we had a clear and reason- 
able solution to the Fisher-Behrens problem, very few experimental 
statisticians would dare ignore it. But many are content to teach sig- 
nificance testing without confidence procedures. (The young chemist 
who can analyze the variance of Latin squares and snatch out single 
degrees of freedom with zest and ease, but who cannot use Student’s ¢ 
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to set confidence limits on A — B, because no one ever mentioned it to 
him, is a poor witness to the teaching of chemists by statisticians!) 

(BS) At any one time, different areas of statistical methodology will be 
in different states of evolution, both in immediate ends and in considera- 
tions. We have only to contrast the two-sample area with the m Xn- 
contingency-table area or the correlation-coefficient area with the 
measures-of-nonnormality-for-time-series area to find application of 
this general principle. 

(C1) Competttive statistical techniques indicate a need for manuals of 
“when to choose which” and not just selection of “the best” technique. Our 
discussion of the two-sample area should have made it clear that what 
is needed here is a guide to the various techniques explaining why and 
when to use them. No selection of a single “best” technique is going to 
be satisfactory. 

Another widely separated area which illustrates the principle nicely 
is the response maximization area. Here we have a spectrum of sugges- 
tions from the carefully thought-out “circle and bee-line (possibly re- 
peated) and then survey” technique of Box and Wilson [5] to the creep- 
ing technique of Friedman and Savage [16] and the sophisticated but 
so far one-dimensional technique of Robbins and Monro [30]. I am sure 
that all of those named have their place, as do, no doubt, some of the 
intermediate points in the spectrum. I have, indeed, some idea of where 
these places are. But I would like to know far more precisely where 
these places are and why. (You couldn’t possibly sell me a single best 
method!) 

(C2) Statisticians owe their clients help in choosing wisely between high 
confidence in a short inference and low confidence in a long inference. In 
the analysis of three and more way analyses of variance, there arises 
the problem of choosing the correct error term (e.g. Goulden [17]). 
This is the first big problem in the analysis of variance, and one that is 
still very effective in separating the statisticians from the children. If 
one classification is years, one choice can be put into words as follows: 
Will you have differences in average performance averaged over these 
particular years, with narrow confidence limits, or will you have dif- 
ferences in average performance, averaged over a population of years of 
which these years are a sample, with much broader confidence limits. 
With regard to this particular example, most experimental statisticians 
are clear and effective. Thus, it may be a solved problem. But in many 
other areas the corresponding problem is not only unsolved but 
unposed! 

Some have queried the use of “short” and “long” in this context, and 
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have tried to relate this choice to that of the proper “breadth” of 
foundation (the advantages of sufficiently broad basis of inference 
have, of course, been ably discussed by Fisher [15, Section 39]). It is 
important to avoid possible confusion in this regard. Considerations of 
breadth arise during the design of an experiment, while considerations 
of length arise in its interpretation. Thus an experiment to compare 
certain psychological characteristics within brother-sister pairs would 
be broadened as to foundation if changed from 50 pairs drawn from 
Indiana to 5 subgroups of ten pairs each from 5 geographically and 
culturally separated areas. For either experiment, there will be a prob- 
lem of length of inference! Will we make statements about the average 
over the 50 pairs of perfectly measured differences, or shall we make 
statements concerning the average differences in larger populations of 
which these 50 pairs, or these 5 sets of 10 pairs are a sample or samples? 
The two questions are quite separate. 

(C3) Techniques of evaluating both the isolated experiment and history 
down to date will continue to be useful. There are many experimental 
procedures that involve either the regular measurements of control 
specimens or the regular use of special calibration procedures. After a 
new calibration, should we use the old calibration? Should we use only 
the new calibration? Or should we combine old and new values? With 
what relative weights? This is a recurrent problem, one whose solution 
might improve measurement accuracies per dollar in a wide variety of 
applications. But who has the solution? or better “the solutions,” be- 
cause the path is long from the isolated group of occasional measure- 
ments to the production line producing measurements steadily. Differ- 
ent locations along this path will require different solutions. Work on 
this problem has undoubtedly been hampered by the tradition of the 
self-contained experiment. But many measurement procedures are far 
from self-contained experiments. 

Like unto this first example is a second. Most procedures of statistical 
analysis today include a measure of spread in this particular experi- 
ment, be it an estimated variance, a total or mean range, or the mean 
square in a certain line of the analysis of variance. Usually there is past 
evidence as to the variability in question. In assessing the results of a 
particular experiment shall we use only the estimate from within the 
experiment? Only past history? Some combination of the two? Which 
combination? 

This problem of how far to look back is widespread and unsolved. 
A solution might allow us to narrow the wide confidence limits that go 
with wide apparent variation and to widen the falsely narrow ones 
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which go with narrow apparent variation. This would equalize our 
exposure to error, and tend to let us make sharper statements on the 
average. Again the philosophy of “each experiment to itself” has stood 
in the way. But why should we allow this to go on? (Of course the phi- 
losophy of “each experiment to itself” is important, of course it must be 
widely used, but neither always or everywhere! Just another example 
of (A2) and (C1).) 

(C4) “What should be done” is almost always more important than 
“what can be done exactly.” Hence new developments in experimental statis- 
tics are more likely to come in the form of approximate methods than in the 
form of exact ones. Once upon a time the calculation of the first four 
moments was an honorable art in statistics. Then came those who could 
calculate the exact distributions of simple expressions. And because 
their results were “exact” they took over the place of honor. (Partly 
too, perhaps, because the moment calculators failed on occasion to 
transform their expressions wisely before calculating the moments.) 
And it came to be infra dig to find moments. In seminars one heard 
A’s achievement of calculating the first four moments for n’s up to 12 
belittled in comparison with B’s proof that the distribution tended to 
normality as n tended to infinity. Yet which result was more useful to 
the experimental statistician with experimental data for n equal to 5, 
10, 20 or even 50—? Probably the first four moments. 

If the moments had been on MacArthur’s staff, their parting state- 
ment would have read “we shall return!” But when? I think that it is 
high time to bring the calculation of moments back to that high estate 
which it deserves . We shall always have to deal with messy expressions, 
whose exact distribution will be found by no one, at least for a long 
time. Moments may allow us to get on with the work. If they do allow 
us to do this, let us use them. 

The variability of estimates of spectra of time series provides a case 
in point. Even with the normality assumption, the exact distribution is 
not going to be easily manageable. Yet the first two moments can be 
found, and found with very useful results. Considerable recent progress 
in the analysis of physical time series rests on those two moments [e.g. 
27, 29]. 

(D1) Statisticians must face up to the existence and varying importance 
of systematic errors. The failure of the statistician to take sufficient 
cognizance of systematic errors has been in part an escape phenomenon. 
To a man looking hopefully for a way to shorten a confidence interval 
by 7 per cent of its length by ingenious devices, the thought of syste- 
matic errors which might make it twice as long comes as a severe shock, 
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and all men try to avoid shocks. Perhaps, too, the recent development 
of statistics in connection with the uncomfortable sciences like agri- 
culture and biology—uncomfortable because unsystematic errors tend 
to be so large—may have much to do with this. Only the sampling sur- 
vey statisticians, with their recent treatment of “non-sampling errors” 
seem to be facing up to the existence of systematic errors. 

What should experimental statistics as a whole do about systematic 
errors? Should we change from “95 per cent confidence” to “5 per cent 
diffidence” and impress on our clients that more diffidence has to be 
added because of systematic errors? Have we been overselling our 
clients on the confidence with which they should accept the results of 
our analyses? Is this why physics is the most-resistant of all the sciences 
to the penetration of statistics? 

Some there will be who will claim that the old ways are good enough, 
since in comparative experiments the systematic errors tend to be very 
much smaller than in absolute experiments. Very much smaller, but 
not zero, is the answer. (The experimental statistician dare not shrink 
from the war cry of the analyst “Only a fool would use it, but it’s 
better than we used to use!,” but on the other hand, he dare not take 
the motto as a permanent excuse for sloppy methods). Here is a real 
unsolved problem of experimental statistics; What about systematic 
errors? 

(D2) Statisticians have an obligation to clarify the foundations of their 
techniques for their clients. I have the impression that, at the time the 
analysis of variance was introduced, the practice of adjusting yields for 
the apparent fertility of blocks was, or would have been, regarded with 
suspicion—“cooking the observations.” Yet the analysis of variance 
which is quite equivalent in its results, seems to have spread without 
opposition of this sort. Was this because the arithmetic was so compli- 
cated that the poor client didn’t understand what was going on? I am 
sorely afraid that this was the case. 

At the beginning, it may have paid the statisticians to fool their clients 
about the analysis of variance, but does it today? I give vent to a 
hearty “no!”, feeling that many clients get far less out of such analyses 
than they should, because they don’t understand what is going on. 
How many of your clients really understand what sorts of additive 
decompositions of the observations underlie the analyses of variance 
you proudly return to them? 

How to explain to the client what the analysis of variance is about? 
This is surely a problem of experimental statistics. Even if I should 
know a large part of the answer, as I hope I do, it is an unsolved prob- 
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lem, since the answer is not at the finger tips of enough experimental 
statisticians. 

In how many other areas are we losing by fooling our clients? 

(D8) Statisticians should be honest and expository about the relation of 
precise “assumptions” and exactly “optimum” solutions to real situations. 
As an example here, let us take a field currently under development. 
Box and his coworkers have been, and continue to be, active in the 
development of designs for the estimation of all the zeroth, first, and 
second degree coefficients in a second degree response surface, where 
the response is a function of 1, 2, 3, 4, 5, etc., variables. In the process 
he is resting heavily on such “exact” concepts as “orthogonailty” and 
“estimating all coefficients with the same variance.” He is well aware 
that, because of the way the designs are to be used, these “exact” math- 
ematical properties are not likely to correspond to any physical reali- 
ties, that, in any particular situation, there is no reason to believe that 
the “exactly optimum” design is appreciably better than any nearby 
design. But even if “exactly optimum” does not mean what it says, it 
may well mean “likely to be quite useful,”, as in this case it does. 

How many of the potential users of such designs will understand that 
“exactly optimum” doesn’t mean what it says? All too few, and for the 
others we statisticians are likely to be to blame. We have pushed 
“optimum” procedures for one reason or another, without adequate 
warning about idealizations and the real world. As a psychologist once 
said when Mosteller discussed “inefficient statistics” before the Eastern 
Psychological Association, “inefficient statistics, but efficient statisti- 
cians”! How often do we miss the chance to have “non-optimal tech- 
niques, but optimal statisticians ” apply to us? 

Another example of the same sort looms large on the horizon. It 
concerns all of bioassay and much of the transformation of counted 
data (a subject about which there are whispers of new discussion). Little 
attention has been paid to gains or losses from “exact” maximum likeli- 
hood, minimum chi-square, or unbiased solutions of bioassay problems. 
Much attention has been spent in getting these “exact” solutions. Does 
it matter whether we use logits, probits, or anglits? How much does it 
matter? (On this there is some information.) What happens if a little 
non-binomial fluctuation creeps in? Have we been realistic about any- 
thing in this whole area? Clearly there are many unsolved problems of 
experimental statistics here. 

(D4) In every statistical area, we almost certainly need methods ad- 
mitting one more nuisance parameter, methods of one higher level of robust- 
ness and de-parametrization, methods with both of these desiderata. Here 
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we may turn the carpet back to see the dirt—it is a large carpet trying 
to cover much dirt. We have a reasonably wide variety of procedures 
for analyzing counted data which assume pure binomial variation. 
Contingency tables, chi-square, and w? goodness of fit tests, Kolmo- 
goroff-Smirnoff bounds on the population distribution, all-or-none bio- 
assay, and so on. The list is long. Many of the techniques are important. 
All of them need procedures admitting the possibility of additional non- 
binomial variation. We gave up long ago assuming that we knew the 
variance of yield of soy bean plots of given size—even though we had 
empirical data on it. We blithely assume that we know the variance of 
preparing a dilution and the variance of death among guinea pigs in- 
jected with a single dilution—we assume one to be zero and the other 
to be binomial! We would criticize the varietal trial without an internal 
estimate of error, yet we look silently on the bioassay without one. 

Perhaps in part we have not attacked these problems because of their 
resemblance to those cited under (C3). Perhaps we have not attacked 
them because their consideration would disturb our clients’ techniques 
or bring to light new sources of variation. But whatever the reasors, 
they do not seem valid to me today. 

Here are many unsolved problems in experimental statistics. 

(D5) Statistics must continually study the behavior of its techniques 
when their conventional assumptions are not true. I have touched on some 
minor examples of this principle. Let me cite a few major ones. 

Many statistical techniques assume homogeneity of variance, each of 
them needs a related technique assuming inhomogeneity of variance. 
How do the present techniques stand up under homogeneity? 

Many statistical techniques utilize a normality assumption almost 
exclusively as a means for predicting the stability of estimated vari- 
ances. Each needs a related robustified technique which allows for the 
effects of non-normality on this stability. How do the present tech- 
niques stand up under non-normality? 

Many discussions of efficiency of estimation assume an underlying 
normal distribution. Each needs related studies assuming suitably 
varied nonnormal distributions. 

How many unsolved problems do we need? 


SOME PROVOCATIVE QUESTIONS 


In providing examples of the various general principles, I have in- 
dicated a number of unsolved problems of experimental statistics, but 
there are a few more at the tip of the tongue. In this section I shall seek 
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to provide a few more, mostly indirectly, by trying to ask some provoca- 
tive questions. 

(1) What are we trying to do with goodness of fit tests? (Surely not to 
test whether the model fits exactly, since we know that no model fits 
exactly!) What then? Does it make sense to lump the effects of syste- 
matic deviations and over-binomial variation? How should we express 
the answers of such a test? 

(2) Why isn’t someone writing a book on one- and two-sample tech- 
niques? (After all, there is a book being written on the straight line!) 
Why does everyone write another general book? (Even 800 pages is 
now insufficient for a complete coverage of standard techniques.) How 
many other areas need independent monograph or book treatment? 

(3) Does anyone know when the correlation coefficient is useful, as 
opposed to when tt is used? If so, why not tell us? What substitutes are 
better for which purposes? 

(4) Why do we test normality? What do we learn? What should we 
learn? 

(5) How soon are we going to develop a well-informed and consistent 
body of opinion on the multiple comparison problem? Can we start soon 
with the immediate end of adding to knowledge? And even agree on 
the place of short cuts? 

(6) How soon are we going to separate regression situations from com- 
parison situations in the analysis of variance? When will we clearly 
distinguish between temperatures and brands, for example, as classifi- 
cations? 

(7) What about regression problems? Do we help our clients to use 
regression techniques blindly or wisely? What are the natural areas in 
regression? What techniques are appropriate in each? How many have 
considered the ‘“‘analyses of variance’ corresponding to taking out the 
regression coefficients in all possible orders? 

(8) What about significance vs. confidence? How many experimental 
statisticians are feeding their clients significance procedures when 
available confidence procedures would be more useful? How many are 
doing the reverse? 

(9) Who has clarified, or can clarify, the problem of nonorthogonal 
(disproportionate) analysis of variance? What should we be trying to do 
in such a situation? What do the available techniques do? Have we 
allowed the superstition that the individual sums of squares should 
add up to the total sum of squares to mislead us? Do we need to find 
new techniques, or to use old ones better? 

(10) What of the analysis of covariance? (There are a few—at least 
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one [10]—discussions which have been thought about.) How many 
experimental statisticians know more than one technique of interpre- 
tation? How many of these know when to use each? What are all the 
reasonable immediate aims of using a covariable or covariables? What 
techniques correspond to each? 

(11) What of the analysis of variance for vectors? Should we use overt 
multivariate procedures, or the simpler ones, ones that more closely 
resemble single variable techniques, which depend on the largest de- 
terminantal root? Who has a clear idea of the strength or scope of such 
methods? 

(12) What of the counting problems of nuclear physics? (For some of 
these the physicists have sound asymptotic theory, for others repairs 
are needed—cf. Link [21].) What happens less asymptotically? What 
about the use of transformations? What sort of nuisance parameter is 
appropriate to allow for non-Poisson fluctuations? What about the 
more complex problems? 

(13) What about the use of transformations? Have the pros and cons 
been assembled? Will the swing from significance to confidence increase 
the use of transformations? How accurate does a transformation need 
to be? Accurate in doing what? 

(14) Who has consolidated our knowledge about truncated and censored 
(ef. [18], p. 149) normal distributions so that tt is available? Why not a 
monograph here that really tells the story? Presumably the techniques 
and insight here are relatively useful, but how and for what? 

(15) What about range-based methods for more complex situations? (We 
have methods for the analysis of single and double classifications based 
on ranges.) What about methods for more complex designs like bal- 
anced incomplete blocks, higher and fractional factorials, lattices, etc.? 
In which areas would they be quicker and easier? In which areas would 
they lead to deeper insight? 

(16) Do the recent active discussions about bioassay indicate the solution 
or impending solution of any problems? What about logits vs. probits? 
Minimum chi-square vs. maximum likelihood? Less sophisticated 
methods vs. all these? Which methods are safe in the hands of an ex- 
pert? Which in the hands of a novice? Does a prescribed routine with 
a precise “correct answer” have any value as such? 

(17) What about life testing? What models should be considered be- 
tween the exponential distribution and the arbitrary distribution? 
What about accelerated testing? (Clearly we must use it for long- 
lived items.) To what extent must we rely on actual service use to 
teach us about life performance? 
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(18) How widely should we use angular randomization [4]? What are 
its psychological handicaps and advantages? Dare we use it in explora- 
tory experimentation? What will be its repercussions on the selection 
of spacings? 

(19) How should we seek specified sorts of inhomogeneity of variance 
about a regression? What about simple procedures? Can we merely 
regress the squared deviations from the fitted line on a suitable func- 
tion? (Let us not depend on normality of distribution in any case!) 
What other approaches are helpful? 

(20) How soon can we begin to integrate selection theory? How does the 
classical theory for an infinite population (as reviewed by Cochran 
[8]) fit together with the second immediate aim of multiple comparisons 
(Bechhofer e¢ al. [1, 2, 14]) and with the a priori views of Berkson [3] 
and Brown [6]? What are the essential parameters for the characteriza- 
tion of a specific selection problem? 

(21) What are appropriate logical formulations for item analysis (as 
used in the construction of psychological tests)? (Surely simple signifi- 
cance tests are inappropriate!) Should we use the method introduced 
by Eddington [32, pp. 101-4] to estimate the true distribution of se- 
lectivity? Should we then calculate the optimum cut off point for this 
estimated true distribution? Or what? 

(22) What should we do when the items are large and correlated? (If, 
for example, we start with 150 measures of personality, and seek to 
find the few most thoroughly related to a given response or attitude.) 
What kind of sequential procedure? How much can we rely on routine 
item analysis techniques? How does experiment for insight differ from 
experiment for prediction? 

(23) How many experimental statisticians are aware of the problems of 
astronomy? What is there in Trumpler and Weaver’s book [32] that is 
new to most experimental statisticians? What in other observational 
problems like the distribution of nebulae (e.g. [23, 26])? 

(24) How many experimental statisticians are aware of the problems 
of geology? What is there in the papers on statistics in geology in the 
Journal of Geology for November 1953 and January 1954 that is new 
to most experimental statisticians? What untreated problems are sug- 
gested there? 

(25) How many experimental statisticians are aware of the problems 
of meteorology? What is there in the books of Conrad and Pollak [9] and 
of Carruthers and Brooks [7] that is new to most experimental statis- 
ticians? What untreated problems are suggested there? 

(26) How many experimental statisticians are aware of the problems 





724 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1954 


of particle size distributions? What is there in Herdan’s book [21] on 
small particle statistics that is new to most experimental statisticians? 
What untreated problems are suggested there? 

(27) What ts the real situation concerning the efficiency of designs with 
self-adjustable analyses—lattices, self-weighted means, etc.—as com- 
pared with their apparent efficiency? Meier [25] has attacked this prob- 
lem for some of standard cases, but what are the repercussions? What 
will happen in other cases? Is there any generally applicable rule of 
thumb which will make approximate allowance for the biases of un- 
sophisticated procedures? 

(28) How can we bring the common principles of design of experiments 
into psychometric work? How can we make allowance for order, prac- 
tice, transfer of training, and the like through specific designs? Are 
environmental variations large enough so that factorial studies should 
always be done simultaneously in a number of geographically separated 
locations? Don’t we really want to factor variance components? If so, 
why not design psychometric experiments to measure variance com- 
ponents? 

(29) How soon will we appreciate that the columns (or rows) of a con- 
tingency table usually have an order? When there is an order, shouldn’t 
we take this in account in our analyses? How can they be efficient 
otherwise? Should we test only against ordered alternatives? If not, 
what is a good rule of thumb for allocating error rates? Yates [40] 
has proposed one technique. What of some others and a comparison 
of their effectivenesses? 

We come now to a set of questions which belong in the list, but which 
we shall treat only briefly since substantial work is known to be in 
progress: 

(30) What usefully can be done with m Xn contingency tables? 

(31) What of a very general treatment of variance components? 

(32) What should we really do with complex analyses of variance? 

(33) How can we modify means and variances to provide good effi- 
ciency for underlying distributions which may or may not be normal? 

(34) What about statistical techniques for data about queues, tele- 
phone traffic, and other similar stochastic processes? 

(35) What are the possibilities of very simple methods of spectral 
analysis of time series? 

(36) What are the variances of cospectral and quadrature spectral 
estimates in the Gaussian case? 

(37) What are useful general representations for higher moments of 
stationary time series? 
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Next we revert to open questions: 

(38) How should we measure and analyze data where several coordi- 
nates replace the time? What determines the efficiency of a design? 
Should we use numerical filtering followed by conventional analysis? 
How much can we do inside the crater? 

(39) What of an iterative approach to discrimination? Can Penrose’s 
technique [28] be usefully applied in a multistage or iterative way or 
both? Does selecting two composites from each of several subgroups 
and then selecting supercomposities from all these composites pay? If 
we remove regression on the first two composites from all variables, 
can we usefully select two new composites from among the residuals? 

(40) Can the Penrose idea be applied usefully to other multiple regres- 
sion situations? Can we use either the simple Penrose or the special 
methods suggested above? 

(41) Is there any sense in seeking a method of “internal discriminant 
analysis”? Such a method would resemble factor analysis in resting on 
no external criterion, but might use discriminant-function-like tech- 
niques. 

(42) Why is there not a clearer discussion of higher fractionation? 
Fractionation (by which we include both fractional factorials and con- 
founding) is reasonably well expounded for the 2” case. But who can 
make 3”, 4", 5" etc. relatively intelligible? 

(43) How many useful fractional factorial designs escape the present 
group theoretical techniques? After all, Latin Squares are kths of a Fk’, 
and most transformation sets do not correspond to simple group 
theory. 

(44) In many applications of higher fractionals, the factors are scaled— 
why don’t we know more about the confounding of the various orthogonal 
polynomials and their interactions (products)? Even a little inquiry shows 
that some particular fractionals are much better than others of the 
same type. 

(46) What about redundant fractions of mized factorials? We know 
perfectly well that there is no useful simple (nonredundant) fraction 
of a 273*4!, but there may be a redundant one, where we omit some 
observations in estimating each effect. What would it be like? 

A number of further provocative questions have been suggested by 
others as a result of the distribution of advance copies of this paper 
and its oral presentation. I indicate some of them in my own words 
and attitude: 

(46) To what extent should we emphasize the practical power of a test? 
Here the practical power is defined as the product of the probability 
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of reaching a definite decision given that a certain technique is used 
by the probability of using the technique. (C. Eisenhart) 

(47) What of regresston with error in x? Are the existing techniques 
satisfactory in the linear case? What of the nonlinear case? (K. A 
Brownlee) 

(48) What of regression when the errors suffer from unknown auto- 
correlations? What techniques can be used? How often is it wise to use 
them? (K. A. Brownlee) 

(49) How can we make it easter for the statistician to “psychoanalyze” 
his client? What are his needs? How can the statistician uncover them? 
What sort of a book or seminar would help him? (W. H. Kruskal) 

(50) How can statisticians be successful without fooling their clients to 
some degree? Isn’t their professional-to-client relation like that of a 
medical man? Must they not follow some of the principles? Do statis- 
ticians need a paraphrase of the Hippocratic Oath? (W. H. Kruskal) 

(61) How far dare a consultant go when invited? Once a consultant is 
trusted in statistical analysis and design, then his opinion is asked on 
a wider and wider variety of questions. Should he express his opinion 
on the general direction that a project should follow? Where should he 
draw the line? (R. L. Anderson) 

In closing these questions, it should not be necessary to remind the 
reader that neither in the last section of examples or in this section of 


provocative questions have we tried to suggest an order of importance 
for the unsolved questions suggested. We leave that to the reader. 


TOOL BUILDING VS. PROBLEM SOLVING 


To judge from published books and articles, experimental statistics 
has grown by finding tools somehow, and then running around using 
them. (This impression is undoubtedly somewhat inaccurate.) Why has 
experimental statistics not been more obviously concerned with prob- 
lems? Partly, perhaps, because it is just beginning to get its growth. 
Partly, perhaps, because dealing with problems is difficult and likely 
to lead to approximate solutions. These are valid reasons, but not 
valid excuses. 

As experimental statistics grows toward maturity, it surely should 
orient more toward areas rather than toward techniques. How much 
more may be a question. But an essential prerequisite to such reorienta- 
tion is some picture of what are the areas. This picture will not spring 
forth full armed, but will come from much work and discussion. As an 
attempted trigger for this work and discussion, the next section pre- 
sents a feeble first attempt at classification. Reader, can you do better? 
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A FEEBLE GUIDE TO AREAS 


We shall set up with a digital classification, but without prejudice 
as to whether the classification provided by one digit is crossed with 
or nested inside that provided by another. The digits provided will 
usually not specify an area completely, but they will usually narrow 
the situation down to a small number of areas. 

The first digit classification refers to the general end of the analysis 
as follows: 

(The assessment of, or determination of a wise action in view of) 


(1) Typical response 

(2) Variability of response 

(3) Distribution of response 

(4) Concealed structures and their coefficients 

(5) Control charts and other “spotting” procedures 
(9) Miscellaneous 


(If answers are expressible in simple or mixed cumulants, then the 
degree of these cumulants with respect to response variables is control- 
ling. (1) contains cases of degree 1; (2) contains cases of degree 2; (3) 
contains cases of higher degree.) Under (1) are included regression 
coefficients as well as means, while correlation analysis considered as a 
study of predictability comes under (2). Contingency tables fall under 
(1), except when the issue is homogeneity, when they fall under (2). 
Factor analysis seems better placed under (4) than under (2), but 
structural regression, as practiced in econometrics, seems to fall most 
naturally under (1). 

The second digit classification refers to the situation of measurement, 
and, in description at least, has to be subordinated to the first digit. 
It runs 


(-1) Isolated (one or a few) responses, isolated (one or a few) vari- 
abilities, isolated (one or a few) distributions, etc. 

(-2) Response curves or surfaces, variabilities as functions of en- 
vironmental variables, etc. 

(-3) Inverse responses (what environment(s) produces a given re- 
sponse), inverse variabilities, etc. 

(-4) Response to nonenvironmental variable (e.g. time shape of 
pulses, distribution of grain sizes, power spectrum of time 
series. ) 

(-9) Miscellaneous 


All of bioassay and sensitivity testing will of course be found in (-9). 





728 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1954 


Problems of maximization of response by altering quantitative vari- 
ables fall best into (-?), since attempts to put them into (-3) as the 
search for that environment where the derivatives vanish seem unwise. 

The third digit classification refers to the nature of the measurement, 
and is easy to apply, namely 


(--1) Absolute measurements without calibration problems 

(--2) Intermediate cases 

(--3) Absolute measurements by comparison with a standard 

(--4) Comparative measurements among a family without calibra- 
tion problems 

(--6) Intermediate cases 

(--6) Comparative measurements among a family with the aid of 
standards 

(--9) Miscellaneous 


The conventional problems of bioassay fall in (--3), while sensitivity 
to explosion or breakage problems based on falling weights may fall in 
(--1). Conventional comparisons of varieties and fertilizers are usually 
thought to fall in (--4), but must, in many cases, fall in (--4). 

The fourth digit expresses the kind of response considered, and is 
again easy to apply. The classes are: 


(---, 1) Directly measured responses 
(---, 2) Responses measured as slopes or regression coefficients 
(---, 3) Adjusted responses (as by covariance) 


No examples seem to be needed. 
The fifth digit specifies the nature of the response, as follows: 


(---, -1) Measured response (on reproducible scale) 
(---, -2) Scored or rated response (by judge or panel) 
(---, -3) Counted (all-or-none) response 

(---, -9) Miscellaneous 


At the present, the impact of this digit on statistical technique is very 
noticeable. Should it remain so? 
The sixth digit specifies the complexity of the response, as follows: 


1) Single variate response 
, --2) Bivariate response 
(and so on) 
(---, --8) Many variate response 
9) Miscellaneous 


Examples here are not needed. 
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The seventh digit describes the complexity of the environments con- 
sidered, as follows: 


, 1) Environment varied only randomly 

, 2) Environment varied in one measured way 

, 8) Environment varied randomly and in one measured 
way 


, 6) Environment varied in a more complex manner 
, 9) Miscellaneous 
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When populations are cross-classified with respect to two 
or more classifications or polytomies, questions often arise 
about the degree of association existing between the several 
polytomies. Most of the traditional measures or indices of as- 
sociation are based upon the standard chi-square statistic or 
on an assumption of underlying joint normality. In this paper 
a number of alternative measures are considered, almost all 
based upon a probabilistic model for activity to which the 
cross-classification may typically lead. Only the case in which 
the population is completely known is considered, so no ques- 
tion of sampling or measurement error appears. We hope, 
however, to publish before long some approximate distribu- 
tions for sample estimators of the measures we propose, and 
approximate tests of hypotheses. Our major theme is that the 
measures of association used by an empirical investigator 
should not be blindly chosen because of tradition and con- 
vention only, although these factors may properly be given 
some weight, but should be constructed in a manner having 
operational meaning within the context of the particular prob- 
lem. 


1, INTRODUCTION 
ANY studies, particularly in the social sciences, deal with popula- 
tions of individuals which are thought of as cross-classified by 
two or more polytomies. For example, the adult individuals living in 
New York City may be classified as to 


Borough: 5 classes 
Newspaper most often read: perhaps 6 classes 
Television set in home or not: 2 classes 

Level of formal education: perhaps 5 classes 
Age: perhaps 10 classes 


For simplicity we deal largely with the case of two polytomies, although 
many of our remarks may be extended to a greater number. The double 
polytomy is the most common, no doubt because of the ease with which 
it can be tabulated and displayed on the printed page. Most of our 
remarks suppose the population completely known in regard to the 
classifications, and indeed this seems to be the way to begin in the 
construction of rational measures of association. After agreement has 
been reached on the utility of a measure for a known population, then 
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one should consider the sampling problems associated with estimation 
and tests about this population parameter. 

A double polytomy may be represented by a table of the following 
kind :! 

































































where 


Classification A divides the population into the a classes 
Ai, Az, +--+, Aa 

Classification B divides the population into the £6 classes 
B,, Bo, +--+, Ba. 

The proportion of the population that is classified as both A, and 
B, is Pad. 


The marginal proportions will be denoted by 


pa. =the proportion of the population classified as Ag. 
p.»=the proportion of the population classified as B,. 


If the use to which a measure of association were to be put could be 
precisely stated, there would be little difficulty in defining an appropri- 
ate measure. For example, using the above cross-classification of the 
New York City population, a television service company might wish to 





1 Tables of this kind are frequently called contingency tables. We shall not use this term because of 
its connotation of a specific sampling scheme when the population is not known and one infers on the 
basis of a sample, 





MEASURES OF ASSOCIATION FOR CROSS CLASSIFICATIONS 735 


place a single newspaper advertisement which would be read by as 
many prospective customers as possible. Then the important informa- 
tion from the table of newspaper-most-often-read vs. television-set-in- 
home-or-not would be: which newspaper is most often read among 
those with television sets? And a reasonable measure of association 
would simply be the proportion of those with television sets who read 
this newspaper. 

It is rarely the case, however, that the purpose of an investigation 
can be so specifically stated. More typically an investigation is ex- 
ploratory or has a multiplicity of goals. Sometimes a measure of associ- 
ation is desired simply so that a large mass of data may be summarized 
compactly. 

The basic theme of this paper is that, even though a single precise 
goal for an investigation cannot be specified, it is still possible and 
desirable to choose a measure of association which has contextual 
meaning, instead of using as a matter of course one of the traditional 
measures. In order to choose a measure of association which has mean- 
ing we propose the construction of probabilistic models of predictive 
activity, the particular model to be chosen in the light of the particular 
investigation at hand. The measure of association will then be a prob- 
ability, or perhaps some simple function of probabilities, within such a 
model. Such is our general contention; most of the remainder of this 
paper is concerned with its exemplification in particular instances. 

We wish to emphasize that the specific measures of association de- 
scribed here are not presented as factotum or universal measures. 
Rather, they are suggested as reasonable for use in appropriate circum- 
stances only, and even in those circumstances other measures may and 
should be considered and investigated. 

A good deal of attention has been paid in the literature to the special 
case of two dichotomies. We are more interested here in measures of 
association suitable for use with any numbers of classes in the polyto- 
mies or classifications. 


2. FOUR PRELIMINARY CONSIDERATIONS 
Four distinctions or cautionary remarks should be made early in any 
discussion of measures of association. 


2.1. Continua 


We may or may not wish to think of a polytomy as arising from an 
underlying continuum. For example, age may for convenience be di- 
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vided into ten classifications, but it clearly does arise from an under- 
lying continuum; however, newspaper-most-often-read would scarcely 
be so construed. If a polytomy does arise from an underlying continuum 
one may or may not wish to assume that the population has some spe- 
cific kind of distribution with respect to it. 

In those cases in which all the polytomies of a study arise jointly from 
a multivariate normal distribution on an underlying continuum, one 
would naturally turn to measures of association based on the correla- 
tion coefficients. These in turn might well be estimated from a sample 
by the tetrachoric correlation coefficient method or a generalization of 
it. In some cases one polytomy may arise from a continuum and the 
other not. An interesting discussion of this case for two dichotomies 
was given in 1915 by Greenwood and Yule ((3], Section 3). We do 
not discuss either of these cases in this paper, but restrict ourselves to 
situations in which there are no relevant underlying continua. 

The desirability of assuming an underlying joint continuum was one 
of the issues of a heated debate forty years ago between Yule [15] on 
the one hand and K. Pearson and Heron [9] on the other. Yule’s 
position was that very frequently it is misleading and artificial to 
assume underlying continua; Pearson and Heron argued that almost 
always such an assumption is both justified and fruitful. 


2.2. Order 


There may or may not be an underlying order between the classifi- 
cations of a polytomy. For example “level of formal education” admits 
an obvious ordering; but borough of residence would not usually be 
thought of in an ordered way. If there is an ordering, it may or may 
not be relevant to the investigation. Sometimes an ordering may be 
important but not its direction. If there is an underlying one-dimen- 
sional continuum, it establishes an ordering. 

When there is no natural or relevant ordering of the classes of a 
polytomy, one may reasonably ask that a measure of association not 
depend on the particular order in which the classes are tabulated. 


2.3. Symmetry 


It may or may not be that one looks at two polytomies symmetri- 
cally. When we are sure a priori that a causal relationship (if it exists) 
runs in one direction but not the other, then our viewpoint will be 
asymmetric. This will also happen if one plans to use the results of the 
experiment in one direction only. On the other hand, there is often no 
reason to give one polytomy precedence over another. 
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2.4. Manner of Formation of the Classes 


Decisions about the definitions of the classes of a polytomy, or 
changes from a finer to a coarser classification (or vice-versa), can 
affect all the measures of association of which we know. For example, 
suppose we begin with the 4X4 table 





.25 





0 





0 





0 




















and combine neighboring pairs of classes. We obtain 




















which might greatly change a measure of association. Or we might 
combine the three bottom rows and the three right-hand columns. 
This gives 














which presents quite a different intuitive degree of association. By 
other poolings one can obtain other 2 X2 tables. 

Although this example is extreme, similar changes can be made in 
the character of almost any cross-classification table. Related examples 
are discussed by Yule [15]. 

At first this consideration might seem to vitiate any reasonable dis- 
cussion of measures of association. We feel, however, that it is in fact 
desirable that a measure of association reflect the classes as defined for 
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the data. Thus one should not speak, for example, of association between 
income level and level of formal education without specifying particular 
class definitions. Of course, in many cases association—however meas- 
ured—would not be much affected by any reasonable redefinition of 
the classes, and then the above finicky form of statement can be simpli- 
fied. That the definition of the classes can affect the degree of associa- 
tion naturally means that careful attention should be given to the class 
definitions in the light of the expected uses of the final conclusions. 


3. CONVENTIONS 


It is conventional, and often convenient, to set up a measure of 
association so that either 


(t) It takes values between —1 and +1 inclusive, is —1 or +1 in 
case of “complete association,” and is zero in the case of inde- 
pendence. 

(ti) It takes values between 0 and +1 inclusive, is +1 in the case 
of “complete association,” and is zero in the case of inde- 
pendence. 


Convention (7) is appropriate when the association is thought of as 
signed (e.g., association between income and dollars spent is positive, 
between income and per cent of income spent is negative). Convention 
(¢) is appropriate when no such sign considerations exist, as when 
there is no natural order. 

“Complete association,” as we shall see, is somewhat ambiguous. 
“Independence,” on the other hand, has its usual meaning, that is 


(1) Po = Pape (@=1,---,a;b=1,---, B). 


Conventions like these have seemed important to some authors, but 
we believe they diminish in importance as the meaningfulness of the 
measure of association increases. One real danger connected with such 
conventions is that the investigator may carry over size preconceptions 
based upon experience with completely different measures subject to 
the same conventions. For example, some elementary statistics text- 
books warn that a population correlation coefficient less than about .5 
in absolute value may have little practical significance, in the sense 
that then the conditional variance is not much less than the marginal 
variance. Research workers in various fields thus tend to develop rather 
strong feelings that population correlation coefficients less than, say, 
.5, have little substantive importance. The same feelings might be 
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carried over, without justification, to all other measures of association 
so defined as to lie between +1 and —1. 

It should also be mentioned that once one has a measure of associ- 
ation satisfying one of the above conventions, then an infinite number 
of others also satisfying the same convention can be obtained—for 
example, by raising to a power and adjusting the sign if necessary. 


4, TRADITIONAL MEASURES 


Excellent accounts of these may be found in [16], Chaps. 2 and 3, 
and [7], Chap. 13. Many of these stem from the standard chi-square 
statistic upon which a test of independence is usually based. If a finite 
population has y members and we set va5=vpab, Ya. =Vpa-, V-b=vp.b, ett., 
the chi-square statistic in the case of two classifications is 


(2) ¥- DE Vq.v.p/v)? ar Ey See 


Va.V.2/v Pa-P-b 











T 

M 

M 
Ss 


A great dea! of attention has been given to the case a=8=2. For 
this special case Yule has defined the following coefficient of association: 


Vi1V22 — Vi2V21 





(3) Q= 


ViV22 + M4201 


whose numerator squared is essentially the same as that of a convenient 
and popular form for x? in the 2X2 case. Another coefficient suggested 
by Yule for the 2 X2 case is 


V 11022 "it V vi2v21 
Vvisve2 + VV vi2ve1 





(4) 


A coefficient often used for the general a X8 case is simply x?/v, often 
called the mean square contingency and denoted by ¢’. A variation 
of this, suggested by Karl Pearson, is 


2 
(5) C= [x / v] 
x?/» 
which has been called the coefficient of contingency, or the coefficient 


of mean square contingency. Another variation, proposed by Tschu- 
prow, is 
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(6) T = V[x*/v]/(a — 1)(6 — 1). 





The last two suggestions, according to Kendall [7], were made in at- 
tempts to norm x’ so that it might lie between 0 and | and take the 
extreme values under independence and “complete association.” 
Cramér ([1], p. 282) suggests the following variant: 


(7) [x?/v]/Min (« — 1, 6 — 1) 


which gives a better norming than does C or T since it lies between 0 
and 1 and actually attains both end points appropriately. Cramér’s 
suggestion does not seem to be well known by workers using this gen- 
eral kind of index. 

The fact that an excellent test of independence may be based on x? 
does not at all mean that x?, or some simple function of it, is an ap- 
propriate measure of degree of association. A discussion of this point 
is presented by R. A. Fisher ({2], Section 21). We have been unable to 
find any convincing published defense of x?-like statistics as measures 
of association. 

One difficulty with the use of the traditional measures, or of any 
measures that are not given operational interpretation, is that it is 
difficult to compare meaningfully their values for two cross-classifica- 
tions. Suppose that C turns out to be .56 and .24 respectively in two 
cross-classification tables. One wants to be able to say that there is 
higher association in the first table than the second, but investigators 
sometimes restrain themselves, with commendable caution, from 
making such a comparison. Their restraint may stem in part from the 
noninterpretability of C. (Of course, when samples are small they may 
also be restrained by inadequate knowledge of sampling fluctuation.) 

One class of measures that will not be discussed here is characterized 
by the assignment of numerical scores to the classes, followed by the 
use of the correlation coefficient on these scores. A recent article on 
such measures is by E. J. Williams [12]. It contains references leading 
back to earlier literature. We feel that the use of arbitrary scores to 
motivate measures is infrequently appropriate, but it should be pointed 
out that measures not motivated by the correlation of scores can often 
be thought of from the score viewpoint. 


5. MEASURES BASED ON OPTIMAL PREDICTION 


5.1. Asymmetrical Optimal Prediction. A Particular Model of Activity 


Let us consider first a probabilistic model which might be useful in 
a situation of the following kind: 





Se 
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(¢) Two polytomies, A and B. 
(it) No relevant underlying continua. 
(ii) No natural ordering of interest. 
(iv) Asymmetry holds: The A classification precedes the B classifi- 
cation chronologically, causally, or otherwise. 


An example of such a situation might be a study of the association 
between college attended (A) and kind of adult occupation (B). Our 
model of activity is the following: An individual is chosen at random 
from the population and we are asked to guess his B-class as well as 
we can, either 


1. Given no further information, or 
2. Given his A class. 


Clearly we can do no worse in case 2 than in case 1. Represent by p.m 
the largest marginal proportion among the B classes and by pam the 
largest proportion in the ath row of the cross-classification table—that 
is 


(8) P-m = Max P-by Pam = Max Pab °* 
db b 


Then in case 1 we are best off guessing that B, for which p..= p..—that 
is, guessing that B class which has the largest marginal proportion—and 
our probability of error is 1—p.». In case 2 we are best off guessing that 
B, for which par = pam (letting Aq be the given A class)—that is, guessing 
that B class that has the largest proportion in the observed A class— 
and our probability of error is? 1— )-apam. 

Then we propose as a measure of association (following Guttman [4]) 


(Prob. of error in case 1) — (Prob. of error in case 2) 





(9) MB= : 
(Prob. of error in case 1) 


Dhow — Pom 


= ’ 


7: 1 — p.m 





which is the relative decrease in probability of error in guessing B, as 
between A, unknown and A, known. To put this another way, \s gives 
the proportion of errors that can be eliminated by taking account of 
knowledge of the A classifications of individuals. 

Some important properties of \, follow: 





* It may be that in case 1 there is more than one b for which p.p p,m. Then any method of choosing 
which of these b’s to guess—including flipping an appropriately multi-sided die—gives rise to the same 
probability of error, 1 —p.m. A similar comment applies to case 2, 
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(t) As is indeterminate if and only if the population lies in one 
column, that is, lies in one B class. 

(t¢) Otherwise the value of \, is between 0 and 1 inclusive. 

(t#¢) A, is 0 if and only if knowledge of the A classification is of no 
help in predicting the B classification, i.e., if there exists a }, 
such that pas, = Pam for all a. 

(tv) \, is 1 if and only if knowledge of an individual’s A class com- 
pletely specifies his B class, i.e., if each row of the cross-classifi- 
fication table contains at most one nonzero pap. 

(v) In the case of statistical independence \», when determinate, is 
zero. The converse need not hold: A, may be zero without sta- 
tistical independence holding. 

(vt) A, is unchanged by permutation of rows or columns. 


That A» may be zero without statistical independence holding may 
be considered by some as a disadvantage of this measure. We feel, 
however, that this is not the case, for \, is constructed specifically to 
measure association in a restricted but definite sense, namely the pre- 
dictive interpretation given. If there is no association in that sense, 
even though there is association in other senses, one would want X, to 
be zero. Moreover, all the measures of association of which we know 
are subject to this kind of criticism in one form or another, and indeed 
it seems inevitable. To obtain a measure of association one must 
sharpen the definition of association, and this means that of the many 
vague intuitive notions of the concept some must be dropped. 

We may similarly define 


} Pmb — Pm- 
b 


(10) = my ’ 





where 
Pm. = Max pg. 
a 


(11) 


Pmb = Max pas. 
a 


Thus A, is the relative decrease in probability of error in guessing A, as 
between B, unknown and known. 

So far as we know, \, and A, were first suggested by Guttman ((4], 
Part I, 4), and our development of them is very similar to his. 


5.2. Symmetrical Optimal Prediction. Another Model of Activity 


In many cases the situation is symmetrical, and one may alter the 





— 
; 
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model of activity as follows: an individual is chosen at random from 
the population and we are asked to guess his A class half the time (at 
random) and his B class half the time (at random) either given: 


1. No further information, or 
2. The class of the individual other than the one being guessed; that 
is the individual’s A, when we guess B, and vice versa. 


In case 1 the probability of error is 1—4(p.m-++pm.), and in case 2 the 
probability of error is 1—4( > \« pam+ >.» Pms). Hence we may consider 
the relative decrease in probability of error as we go from case 1 to 
case 2, and define the coefficient 


[> Pam + Xu Pm — P-m — Pm. | 


.~ 3 (p-m + Pm:) 
Some properties of \ follow: 


(¢) \ is determinate except when the entire population lies in a 
single cell of the table. 
(ii) Otherwise the value of \ is between 0 and 1 inclusive. 
(tt¢) Nis 1 if and only if all the population is concentrated in cells no 
two of which are in the same row or column. 
(iv) \ is O in the case of statistical independence, but the converse 
need not hold. 
(v) \ is unchanged by permutations of rows or columns. 
(vi) \ lies between A, and X, inclusive. 





(12) = 


The computation of dg, A», or > is extremely simple. Usually one is 
given the population, not in terms of the p,,’s but rather in terms of the 
numbers of individuals in each cell. Let » be the total number of indi- 
viduals in the population, v45=vpas, Yam=VPam, Vmb=VPmb, aNd sO on. 








Then 
>> Yam — Ym 
(13) »=— , 
Y — Vim 
tx Vab — Ym 
(14) ho = ’ 
Vv — Vm- 
pe van + >, Vb — Vim — Vm- 
(15) 1 — : 





2v — (¥.m + Vn) 
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5.3. An example 


The following table is taken from reference [7], p. 300, and originally 
was given by Ammon in “Zur Anthropologie der Badener.” It deals 
with hair and eye color of males. The table is given in terms of the 
Yab'8. Ai, Ag, A; are respectively Blue, Grey or Green, Brown; B,, B,, 
B;, B, are respectively Fair, Brown, Black, Red. 





Eye Hair Color Group 
Color 
Group B, B, B; By 








Ay 47 
Az 53 
As 16 





V.b 

















oa 
Vmg = 1387 
438 Vag = 
VA 
Vim = 2829 —_ 
3,954 —3,132 822 
~ 6,800 — 3,132 3,668 
3,593 —2,829 764 
~ 6,800 — 2,829 3,971 
822 +764 1,586 
~ 3,668 +3,971 7,639 





= .1924 








= .2076. 


(Quotients are given to four places.) The traditional measures of asso- 
ciation have the following values: x?/y=.1581, C=.3695, T=.2541, 
Cramér’s measure = .2812. 

This example appears as an illustration of the usual approach to 
measures of association in [7], a standard statistical reference work. 
lt is not hard to think of interpretations or variations in which one 
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of the \ coefficients would be appropriate. For example, one might be 
studying the efficacy of an identification scheme for males in which 
hair color was given but not eye color. Another example might be in 
connection with a study of popular beliefs about the relationship be- 
tween hair color and eye color. 


5.4. Weighting Columns or Rows 


In some cases, particularly when comparisons between different 
populations are important, the measures A, A», or A may not be suit- 
able, since they depend essentially on the marginal frequencies. To 
put this in terms of the model of activity: in some cases we do not want to 
think of choosing an individual from the actual population at hand in 
a random way, but rather from some other population which is related 
to the actual population in terms of conditional frequencies. 

This point is stressed by Yule in reference [15] and is illustrated by 
the kind of medical example*® given there. Suppose that we are con- 
cerned with the effects of a medical treatment on persons contracting 
an often fatal disease. Very large samples from two different hospitals 
are available, giving the following pa, tables: 


Hospital I Hospital II 
Lived Died Total Lived Died Total 








Treated .84 .04 .88 .42 .02 .44 
Not treated .03 .09 .12 .14 .42 .55 














Total .87 13 1.00 56 44 1.00 





























Here the A classes are Treated or Not-treated, and the B classes Lived 
or Died. The given numbers are p’s and marginal p’s. 

We are interested in the association between treatment and life, and 
might conclude that A, would be an appropriate measure of this. We 


find 
; .93 — .87 
for Hospital I = we Ug = .462 


; .84 — .56 
for Hospital II = ay = .636. 





’ We do not wish to suggest by this example that » is necessarily appropriate as a measure of 
association between treatment and cure. A very interesting discussion of this medical case has been 
given by Greenwood and Yule [3] who bring out many difficulties and suggest various viewpoints. 
Another interesting paper on the medical 2 X2 table is that of Youden [14]. 
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Yet the conditional probabilities of life, given treatment (nontreat- 
ment), are exactly the same for both hospitals, namely .955 (.250). 
The reason that the conditional probabilities are the same while the 
AX» values are different is, of course, that the two hospitals treated very 
different proportions of their patients. And the proportions treated 
were probably determined by factors having nothing to do with ‘in- 
herent’ association between treatment and cure. 

It may seem reasonable in such a case as this to replace our model of 
activity by one in which an individual is drawn from the population so 
that the probability of his being in any given A, is exactly 1/a, i.e., so 
that all A classes are equiprobable; and with conditional B class prob- 
abilities equal to those of the original population. That is to say, it 
may seem reasonable to replace the quantities p,, by the quantities 
(16) J te 

Q Ppa. 
and use this as the population to which \, is applied. We may thus de- 
fine, in terms of the conditional probabilities given Ag, 


1 om 1 
a - Pp. a Max ys Pab 
Qa Pa: a d a Pa- 


(17) * = 





Pab 


1 
1 —— Max >> 


a db a Pa- 


If we do this in the present example, we get, of course, the same al- 
tered p table for both hospitals 





.477 





125 








.602 




















and in both cases 


. 250 
* = “308 = .628. 


An analogous procedure could be used to define \,* and A*. Note also 
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that other ‘artificial’ marginal p’s besides .5 could be used if appropri- 
ate. Yule [15] suggests as a desideratum for coefficients of association 
their invariance under transformations on the { p.,} matrix of form 


Pab — Salspar, 8a, & > 0; a=1,---,a; b=1,---,8. 


Such a transformation may readily be found (at least when no p,=0) 
to make all four marginals of a two by two table equal to .5. In 
this connection, we refer to a recent article by Pompilj [10] in which 
such transformations are carefully discussed. 

All further measures may be considered for unweighted or weighted 
marginal proportions, whichever are appropriate. 


6. MEASURES BASED UPON OPTIMAL PREDICTION OF ORDER 
6.1. Prelimtnartes 


Heretofore we have considered measures of association suitable for 
the unordered case, that is, measures which do not change if the 
columns (rows) are permuted. Now we shall suggest a measure suit- 
able for the ordered case. Suppose that the situation is of the following 
kind: 

(t) Two polytomies, A and B. 

(ti) No relevant underlying continua. 

(it) Directed ordering is of interest. 

(iv) The two polytomies appear symmetrically. 


By (iti) we mean that we wish to distinguish, in the 3X3 case between, 
for example, 








0 0 








0 0 








P33 P31 
































calling the first of these complete association and the second complete 
counterassociation. We may wish to make the convention that in these 
two cases the proposed measure should take the values +1 and —1 
respectively. If the sense or direction of order is irrelevant we can, for 
example, simply take the absolute value of a measure appropriate to 
directed ordering. 





748 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 19% 


There are vaguenesses in the idea of complete ordered association. 
For example, everyone would probably agree that the following case 
is one of complete association: 


























The following situation is not so clear: 


























As before, the procedure we shall adopt toward this and toward more 
complex questions is to base the measure of association on a probabilis- 
tic model of activity which often may be appropriate and typical. 


6.2. A Proposed Measure 


Our proposed model will now be described. Suppose that two individ- 
uals are taken independently and at random from the population 
(technically with replacement, but this is unimportant for large popu- 
lations). Each falls into some (Ag, B,) cell. Let us say that the first falls 
in the (Aq,, By,) cell, and the second in the (Aas; By,) cell. (Underlined 


letters denote random variables. ) a; (¢=1, 2) takes values from 1 to a; 
b; (¢=1, 2) takes values from 1 to B. 

~ Tf there is independence, one expects that the order of the a’s has 
no connection with the order of the b’s. If there is high association one 
expects that the order of the a’s would generally be the same as that of 
the b’s. If there is high counterassociation one expects that the orders 


would generally be different. 
Let us therefore ask about the probabilities for like and unlike or- 
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ders. In order to avoid ambiguity, these probabilities will be taken 
conditionally on the absence of ties. Set 


(18) I, = Pr {ai<a and bi <b; or a >a and b, > be} 
(19) Ia = Pr { a; <q and hb >bh; or ma > and hi < be} 
(20) 1, = Pr {ai=a, or bi = by}. 


Then the conditional probability of like orders given no ties is II,/(1—T,) 
and the conditional probability of unlike orders given no ties is 
Il4/(1—TI,). Of course, the sum of these two quantities is one. 

A possible measure of association would then be II,/(1—TII,), but it 
is a bit more convenient to look at the following quantity: 

II, i IIg 
(21) 1° TE 


or the difference between the conditional probabilities of like and unlike 
orders. In other words y tells us how much more probable it is to get 
like than unlike orders in the two classifications, when two individuals 
are chosen at random from the population. 

Since II,+IIz=1—TIl,, we may write y as 


(22) 





which is convenient for computation, using the easily checked relation- 
ships 


(23) M. = 22) 2 pal xX LX raw} 


a’>a 06'>b 


(24) Te = >> po.2 + 2X pst— >, e Pad? . 


Some important properties of y follow: 


(¢) y is indeterminate if the population is concentrated in a single 
row or column of the cross-classification table. 

(it) y is 1 if the population is concentrated in an upper-left to 
lower-right diagonal of the cross-classification table. y is —1 
if the population is concentrated in a lower-left to upper-right 
diagonal of the table. 

(tt¢) y is 0 in the case of independence, but the converse need not 
hold except in the 2X2 case. An example of nonindependence 
with y=0 is 
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For tables up to 5X5 with p’s expressed to two decimal places com- 
putation is fairly rapid. If many tables of the same size are at hand a 
cardboard template would be convenient. A check on II, is to recom- 
pute using inverted ordering in both dimensions. y may be rewritten 
in terms of the »’s by putting “v.,” for “pas,” etc., and replacing “1” in 
(22) by “v?.” 

In the 2 X2 case we find that 


Pirp22 — Pi2P21 





(25) 
Pupe2 + pi2p2 


This is the same as Yule’s coefficient of association Q mentioned in 
Section 4. In this case y = +1 if any one cell is empty. For example, 





pu 0 





P21 P22 














gives rise to y =1 always. 
Any case of the following forms will give rise to y=1, since a con- 
flict in order is impossible: 








Pi2 0 








P22 par 0 








0 P33 Pat Ps2 
































The right-hand table might be thought of as a case of “complete curvi- 
linear association.” 

Stuart [11], starting from a suggestion by Kendall [6], has proposed 
@ measure of association in the ordered case much like y. Stuart’s 
measure, which he calls 7, is, in our notation, 





a i 
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Il, race & Ila 
(m — 1)/m 


where m= Min (a, 8). The term (m—1)/m is introduced in order that 
7, may attain, or nearly attain, the absolute value 1 when the entire 
population lies in a longest diagonal of the table. Stuart develops his 
measure by considering a two-way ordered classification table as two 
rankings of a population, where many ties appear in one or both rank- 
ings as two individuals of the population fall in the same column or row 
or both. Then each ordered pair of individuals is assigned a score with 
respect to each ranking: 0 if there is a tie, or +1 as one or the other is 
ranked higher. Finally the product-moment correlation coefficient is 
formally computed with these scores, and the norming factor is intro- 
duced. 

Thus, our development of is seen to give another and more natural 
interpretation for the numerator of 7,: it is the probability of like order 
less the probability of unlike order when two individuals are chosen at 
random. In addition the form in which 7, is given above, together with 
(23) and (24), suggests a computation procedure somewhat different 
than that of [11]. 


6.3. An Example 


Whelpton, Kaiser, and others [17] have investigated in great detail 
relationships between human fertility and a number of social and psy- 
chological characteristics of married couples. The analyses resulting 
from these investigations are replete with cross-classification tables, 
together with accompanying verbal explanations and recapitulations. 
Numerical indexes of association appear to have been used rarely, if at 
all, in this work. 

We wish to examine briefly one of these cross-classification tables 
as an example of a cross-classification with an order in both classifica- 
tions. This examination should be construed neither as approval nor 
criticism of the methodology used in the studies edited by Whelpton 
and Kaiser, for this would not be appropriate here. (The reader may 
refer to [18] and [19] for critical reviews.) However, we do feel that 
the use of summarizing indexes of association in a study of this kind 
may well be worth while for at least two reasons. One is that the 
reader finds it very difficult to obtain a bird’s-eye view of the extensive 
numerical material without depending almost wholly on the author’s 
own conclusions. Second, the use of indexes would mitigate the criticism 
that the author, consciously or not, selects from his numerical data 


Te 
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those comparisons that are in line with his a priori beliefs. Needless to 
say, an index of association is recommended by these arguments only 
if it has some reasonable interpretation. 

The particular table we wish to consider follows, in terms of numbers 
of married couples. It refers to a rather special, but well defined, popu- 
lation: white Protestant married couples living in Indianapolis, mar- 
ried in 1927, 1928, or 1929, and so on. The data were obtained by strati- 
fied sampling, with strata based on numbers of live births. However, 
for present purposes we do not consider any questions of sampling, 
response error, specification of population, etc. The table is condensed 
from a more detailed cross-classification given in [17], vol. 2, pp. 286, 
389, and 402. Further, we shall not define the fertility-planning cate- 
gories that follow, but merely indicate the order. 


CROSS-CLASSIFICATION BETWEEN EDUCATIONAL LEVEL OF 
WIFE AND FERTILITY-PLANNING STATUS OF COUPLE. 
SOURCE [17], VOL. 2. NUMBERS IN BODY 
OF TABLE ARE FREQUENCIES 





Fertility-planning status of couple 





A B C D 
Highest level Most 
of formal effective Least 
education planning effective 
of wife of number planning 
and spac- of children 
ing of 
children 





one year college 
or more 102 34 


3 or 4 years high 
school 191 80 215 122 608 


less than 3 years 
high school 110 90 168 223 591 














Column totals 403 205 451 379 1438 








This is clearly a case where there is relevant order in both classifi- 
cations. We may first compute II, as follows (schematically) : 
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Il, [102(80 + 90 + 215 + 168 + 122 + 223) 


~ (1438)? 
+ 35(215 + 168 + 122 + 223) + - - - + 215 (223)] 


[102 X 898 + 35 X 728+ --- +215 X 223] 


~ (1438)? 
2X 311,632 

= = 301. 
2,067,844 


This means that if we pick two couples at random from those included 
in the table, the probability is .301 that they are not tied in either 
classification and that they fall in the same order for both classifications 
(e.g., if educational level of wife is greater for first couple chosen, then 
effectiveness of fertility planning is also greater). 

Similarly we compute that IIz=.163. This is the probability of no 
ties and different orders. Finally II,, the probability of a tie in at least 
one classification, is .536. Note that 1,+4+,= 1.000. 

The conditional probability of like order, given no tie, is II,/(1—TI1,) 
= .301/.464 =.649; and the conditional probability of unlike order is 
.163/.464 =.351. Clearly there is a greater chance of like order than of 
unlike order, and this means positive association, if the operational 
model is a reasonable one. To measure the magnitude of this association 
we may use 7, which here is equal to 


301 — .163 
464 





= .298. 


This is the difference between the conditional probabilities of like and 
unlike order, given no ties. 

It might be thought that one should look, not at the actual popula- 
tion above, but at a related population with equal row totals and with 
the same relative frequencies within each row. That is, we might wish 
to work with a derived population within which one-third of the wives 
lie in each education category, but which is otherwise the same. This 
derived population is readily obtained (in terms of its pas’s) by dividing 
each frequency in the above table by three times the total in its row. 
Very minor adjustments were made because of rounding, in order that 
the over-all sum be 1.000. For the same reason, the row totals are not 
exactly equal. 
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CROSS-CLASSIFICATION BETWEEN EDUCATIONAL LEVEL OF 
WIFE AND FERTILITY-PLANNING STATUS OF COUPLE. DE- 
RIVED FROM PRIOR TABLE BY ADJUSTMENT TO MAKE ROW 
TOTALS EQUAL. NUMBERS IN BODY OF TABLE 
ARE RELATIVE FREQUENCIES (p,»’s). 





Fertility-planning status of couple 





A B C D 
Highest level Most 
of formal effective Least 
education planning effective Row 
of wife of number planning totals 
and spac- of children 
ing of 
children 





one year college 
or more .142 


3 or 4 years high 
school .105 i .118 


less than 3 years 
high school .062 ‘ .095 

















Column totals .309 ; .308 





For this table we find II, = .325, 1a=.170, 1,=.505. 

Hence II,/(1—I,) =.657, M4¢/(1—I,) =.343, and y=.314. There is 
no great difference between the original and the adjusted table in re- 
gard to association as measured by probabilities of like and unlike 
order. 

Alternatively, one might wish to adjust the tabular entries so that 
column totals are equal, or one might attempt to adjust the entries so 
that the row totals are equal and the column entries are equal. 


7. THE GENERATION OF MEASURES BY THE INTRODUCTION 
OF LOSS FUNCTIONS 


7.1. Models Based on Loss Functions 


Instead of obtaining a measure as a natural function of probabilities 
in the context of a model of predictive behavior, one can more generally 
employ loss functions. In such a way, one can even artificially generate 
the conventional measures described in Section 4. 
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7.2. Loss Functions and the \ Measures 


In the context of Section 5.1 let us suppose that in guessing an indi- 
vidual’s B class one incurs a loss L(b;, bz), where Bz, is the true B class 
and By», is the guessed one. Consider first guessing B, given no informa- 
tion. Then a scheme of guessing B, with probability ps(p,20, >. ps=1) 
leads to an average loss of > > p.», Po, L(bi, bz). It is easily seen that 

1 


this average is minimized by guessing that By, for which >, p.sL(b, bs) 
6 


is 2 minimum, or if there are two or more minima by guessing any one 
of them. Let by be any one of these b,’s, so that the minimum average 
loss is >> p.» L(b, bz). 

6 


On the other hand if the individual’s A class is known to be Ag, 
the best scheme of guessing is to select b. to minimize ). pas L(b, bz). 
6 


Let bz. be such a minimizing b2; then the minimum average loss when 
A, is known is >> (as/pe-) L(b, bie), and the over-all minimum aver- 
D 


age loss with A,’s knownis >> )> pas L(b, bz). 
a ob 


The decrease in loss as we pass from the first case to the second is 
therefore 


(26) z p.L(b, bz) nis ) ZZ Parl (b, bra). 
b a b 


It would be reasonable to norm this by division by the first term, 
> p.» L(b, bz), to obtain a generalization of As. 
b 


Notice that if L(b;, bz) is O when b; = b2 and 1 when bbe, we obtain 
exactly A». Analogous procedures give us generalizations of \, and X. 
A slight extension of the procedure, permitting the loss to depend on 
the true A class as well as the true and guessed B classes, gives a gen- 
eralization of \,*. 


7.3. The Conventional Measures in Terms of Loss Functions 


Suppose, instead of predicting the classes of individuals, we are asked 
to determine the values pa, when only the p,. and p., are known. In the 
case of independence, these pas are pa. p.». In the more general case, the 
difference between pa, and pa. p., may be thought of as the amount of 
error made by assuming independence, If the loss is proportional to 
the square of the error, inversely proportional to the estimate pa. p.», 
and additive, we have 
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(27) Sra Mere 
a b 


Pa-P-d 





where the k,.’s are given constants. For comparison with standard 
chi-square, express this in terms of the »,,’s 


=) 


Vqa-V.b 





(28) DL be 


and finally set k,,=» to obtain just the chi-square statistic. 

Although this procedure and loss function seem to us rather arti- 
ficial, they do give one way of motivating the chi-square statistic as a 
measure of association. 


8. RELIABILITY MODELS 


8.1. Generalities 


Consider now cases in which the classes are the same for the two poly- 
tomies, so that we deal with an a Xa table, but differ in that assign- 
ment to class depends on which of two methods of assignment is used. 
Thus we might for example consider two psychological tests both of 
which classify deranged individuals as to the type of mental disorder 


from which they suffer. Or again, we might consider two observers 
taking part in a sociological experiment wherein they independently 
and subjectively rate each child in a group of children on a five point 
scale for degree of cooperation. 

One is often concerned in such cases with the degree to which the 
two methods of assignment to class agree with each other. In the case 
of the psychological tests, for example, one of the tests might be a well 
established standard procedure and the other might be a more easily 
applied variant under consideration as a substitute. The psychologist 
would probably only consider the variant seriously if it gave the same 
answers as the standard test often enough in some sense which he would 
have to explicate. In the case of the two observers, the problem might 
be whether the kind of subjective ratings given by trained observers 
in that context are similar enough to warrant the use of such subjective 
ratings at all. 

As before we shall not consider here sampling problems, but rather 
shall suppose the population p,.s’s known. The several distinctions and 
conventions of Sections 2 and 3 apply here of course, but the measures 
suggested in Sections 5 and 6 do not seem appropriate in this reliability 
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context. One reason is that the classes are the same for both polytomies. 
This means that even in the unordered case we do not want a measure 
which is invariant under interchange of rows and interchange of col- 
umns unless the two interchanges are the same. 

An obvious measure of reliability in such a study is just 2 pee; 


the probability of agreement. However, we shall also consider some 
other possibilities. 


8.2. A Measure of Reliability in the Unordered Case 


The measure we shall now propose might be appropriate under the 
following conditions: 


(¢) Two polytomies are the same, but arise from different methods 
of assignment to class. 
(it) No relevant underlying continua. 
(it) No relevant ordering. 
(iv) Our interest in reliability is symmetrical as between the two 
polytomies. 


A modal class over both classifications is any A.(=B,) such that 
Pa +p-a= pa’. +p-a’ for all a’. It is simplest to suppose that there is a 
unique modal class, but if there are more any can be chosen. Denote by 
pu. and p.4 the two marginal proportions corresponding to the modal 
class. 

A modal class can be given the following interpretation: choose an 
individual at random from the population and pick one of the two 
methods of assignment by flipping a fair coin. What is the long-run 
best guess beforehand of how the chosen method will classify the chosen 
individual? The answer is: the modal class; and if the modal class is 
A,, then the probability of a correct guess is }(pa.+p.c) =4}(pm.+p-m)- 

In so far as there is good reliability between the two methods of as- 
signment, one could make a better guess if one knew how the other 
method of assignment would classify the individual, and then followed 
the rule of guessing the same class for the method being predicted. 
The probability of a correct guess would then be > aa. Thus as we go 
from the no information situation to the other-method-known situa- 
tion, the probability of error decreases by > psa—}(om.+p.m). This 
quantity may vary from —} to 1—(1/a). It takes the value —} when 
all the diagonal p,.’s are zero and the modal probability, pw.tp.™ 
is 1. It takes the value 1—(1/a) when the two methods always agree 
and each category is equi-probable. 





758 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1954 


To get a measure we should alter the above quantity, since a suf- 
ficiently large pas for some a will make the above quantity low even 
though } ‘pc is nearly 1. It seems reasonable to norm by division by 
the probability of error given no information, that is by 1—}(pu.+p.x). 
Hence we propose the measure 


i 2s Paa — 3(pm. + p.m) ; 
1 — 3(pm. + p.m) 


This may be interpreted as the relative decrease in error probability as 
we go from the no information situation to the other-method-known 
situation. 

The measure \, can take values from —1 to 1. It takes the value —1 
when all the diagonal p,,.’s are zero and the modal probability, py.+p.1 
is 1. It takes the value 1 when the two methods always agree. \, is 
indeterminate only when both methods always give only one and the 
same class. In the case of independence \, assumes no particular value. 
This characteristic might be considered a disadvantage, but it seems 
to us that an index of this kind would only be used where there is 
known to be dependence between the methods, so that misbehavior 
of the index for independence is not important. 


8.3. Reliabtlity in the Ordered Case 


For the case in which the classes are ordered, but a meaningful 
metric is absent, we have been unable to find a measure better than 
one of the following kind: 


(29) Ar 





(30a) > pac (as suggested in Section 8.1) 


a=] 


(30b) bi Pad 


ja—b| 31 


(30c) do Pat, 
ja—b] S2 

that is, the only reasonable measures we know of are those that are 
based upon either the probability of agreement, the probability of 
agreement to within one neighboring class, two neighboring classes, 
and so on. If desired one could weight these probabilities when classi- 
fication in a neighboring class is not as desirable as in the same class. 
Thus one might consider something like > pea+} > pa or its obvious 


|a—b| =1 
variants. These measures may also be justified easily by loss-function 
arguments. 
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9. PROPORTIONAL PREDICTION 


Instead of basing a measure of association on optimal prediction 
one might consider measures based upon a prediction method which 
reconstructs the population, in a sense to be described. The use of such 
a measure was suggested to us by W. Allen Wallis. For simplicity, we 
restrict ourselves to the asymmetric situation of Section 5.1 where 
\» was constructed. Of course one could apply the same approach in 
other situations. 

Our model of activity, as before, is the following: An individual is 
chosen at random from the population and we are asked to guess his 
B class either (1) given no information or (2) given his A class. 

Optimal guessing will lead to a definite B class in case (1) and to a 
definite B class for each A class in case (2) (except that in the case of 
tied p.s’S OF pas’s we have some choice). While such optimal guessing 
leads to the lowest average frequency of error, the resulting distribu- 
tion of guessed classes will usually be very different from the original 
distribution in the population. For some purposes this might be unde- 
sirable and one is led to the following model of activity: 


Case 1. Guess B, with probability p.,, Bz with probability p.2, ---, 
Bz with probability p.g. 

Case 2. Guess B,; with probability pa/p.. (the conditional probabil- 
ity of B, given A,), Bz with probability pa2/pa.,---, Be 
with probability pag/pa.. 


In each case the guessing is to proceed by throwing a f-sided die whose 
bth side appears with probability p., (case 1) or pa/pa. (case 2). This 
may be accomplished using a table of “random numbers.” If we make 
many such guesses independently it is plain that we shall approximately 
reconstruct the marginal distribution of the B,’s (case 1) and the joint 
distribution of the (Aa, B,)’s (case 2). 
The long-run proportion of correct predictions in case (1) will be 
a J 
p.s*, and in case (2) it will be >) >> pas?/pa.. Hence the relative 
b=1 a=1 d=1 
decrease in the proportion of incorrect predictions as we go from 


case (1) to case (2) is 
bs > pas?/ pa: = } p-»? 
a b 6 


1— Dp.’ 
b 





(31) 


which can be readily expressed in the chi-square-like form 





760 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 10954 


¥ p> (par - Pa-P-b) 


Pa- 


:= » pv? 
b 


It is clear that 7, takes values between 0 and 1; it is 0 if and only if 
there is independence, and 1 if and only if knowledge of A, completely 
determines B,. Finally 7, is indeterminate if and only if both independ- 
ence and determinism simultaneously hold, that is if all p.,’s but one 
are zero. 








(32) 


10. ASSOCIATION WITH A PARTICULAR CATEGORY 


A group of modifications of many of the preceding measures arises 
from the observation that there may be little association between the 
A and B polytomies in general, but if an individual is in a particular A 
class it may be easy to predict his B class. Suppose, then, that we want 
the association between A,,, a specific A class, and the B polytomy. 
One need only condense all the A, rows where a¥ dp into a single row, 
thus obtaining a 2X8 table, and apply whatever measure of association 
is thought appropriate. The table will have this appearance. 





B, B, as Bs 








Aa, Pag Pag2 ae Pao8 





Ag (aa) || pa—par | P-2—Pag cee P-8— Pars 


























We are indebted to L. L. Thurstone for pointing out to us the impor- 
tance of this modification. 


11, PARTIAL ASSOCIATION 


When there are more than two polytomies it is natural to think of 
partial association between two of them with the effect of the others 
averaged out in some sense. Two such measures of partial association 
will be suggested here for the asymmetrical case and three polytomies. 
The viewpoint will be that of optimal prediction. Analogous symmet- 
rical measures may be readily obtained, and the restriction to three 
polytomies is purely for convenience of notation. The first two poly- 
tomies will be denoted as before; the third will consist of the classifica- 
tion C,, C2, - --, C,. The proportion of the population in A,, By, and 
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C, is pate, and dots will be used to denote marginal values in the con- 
ventional way. The proposed measures will be for partial association 
between the A and B polytomies ‘averaged’ over the C polytomy. (Do 
not confuse the integer used here with the index y of Section 6.) 


11.1. Simple Average of Xs 

For fixed C., we have a conditional AXB double polytomy with 
relative frequencies pas-/p..c. Hence we can compute A, for each such 
table—call it \,(c) to show dependence on c. Now it might seem natural 
to average these values with weights equal to the marginal relative 
frequencies of the C classifications. That is, we suggest 


(33) (A, B| C) = > p.-cdo(C). 
cml 


11.2. Measure Based Directly on Probabilities of Error 


It seems to us somewhat better, from the viewpoint of interpreta- 
tion, to proceed as follows. For given C, if we predict B classes opti- 
mally on the basis of no further information, the probability of error is 
1—(Maxs p.s-)/p..c; Whereas if we know the A class the probability of 
error is 1—( > « Maxs pat-)/p..c. Hence, if we are given individuals 
from the population at random and always told their C class, the 
probability of error in optimal guessing if we know nothing more is 
1— >>. Maxs p.s-; whereas if we also know the A class the probability is 
1—-)°. >aMaxs pate. Thus the relative decrease in probability of error 


18 


DX DY Max pare — D> Max pre 
b c b 


1 — > Max p.x 
ce 6 





(34) r(A, B| C) = 


which might often be a satisfactory measure of partial association. 


12, MULTIPLE ASSOCIATION 


When there are more than two polytomies one may well be interested 
in the multiple association between one of them and all the others. 
One simple way of handling this in the unordered case will be described 
here for three polytomies A, B, and C as defined in Section 11. We sup- 
pose that the multiple association between A and B-together-with-C 
is of interest. Simply form a two-way table whose rows represent the A 
polytomy and whose columns represent all combinations B,, C, and 
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then apply the appropriate two-polytomy measure. The table will have 
this appearance: 





B,C, | BiC2 

















P11 P112 





P21 | P22 























Aa 























Note that this procedure does not take the B XC association into ac- 
count. There is a rough analogy here with the motivation for the stand- 
ard multiple correlation coefficient of normal theory. The standard 
multiple correlation coefficient may be (and often is) motivated by de- 
fining it as the maximum correlation coefficient obtainable between a 
given variate and linear combinations of the other variates. That is, it is 
a measure of association between a given variate and the best estimate 
(in a certain sense) of that variate based upon all the other variates. 
It is true that the standard multiple correlation coefficient may be ex- 
pressed as a function of the several ordinary bivariate correlation coef- 
ficients, but in a sense this is a consequence of the strong structural as- 
sumption of multivariate normality. 


13. SAMPLING PROBLEMS 


The discussion thus far has been in terms of known populations, 
whereas in practice one generally deals with a sample from an unknown 
population. One then asks, given a formal measure of association, how 
to estimate its value, how to test hypotheses about it, and so on. 

Exact sampling theory for estimators from cross-classification 
tables is difficult to work with. However, the asymptotic theory is 
reasonably manageable, at least in some cases. We intend to discuss 
this in another paper, where we shall state some of the asymptotic 
distributions and say what we can of their value as approximations. 
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14. CONCLUDING REMARKS 


The aim of this paper has been to argue that measures of association 
should not be taken blindly from the handiest statistics textbook, but 
rather should be carefully constructed in a manner appropriate to the 
problem at hand. To emphasize and illustrate this argument we have 
described a number of such measures which we feel might be useful in 
several situations. While we naturally take a friendly view towards 
these measures, we can hardly claim that they are more than examples. 

This methodologically neutral position should not be carried to an 
extreme. It would be ridiculous to ask each empirical scientist in each 
separate study to forge afresh new statistical tools. The artist cannot 
paint many pictures if he must spend most of his time mixing pigments. 
Our belief is that each scientific area that has use for measures of asso- 
ciation should, after appropriate argument and trial,‘ settle down on 
those measures most useful for its needs. 
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A TEST OF GOODNESS OF FIT 


T. W. ANDERSON AND D. A. Dar.ine* 
Columbia University and University of Michigan 

Some (large sample) significance points are tabulated for 
a distribution-free test of goodness of fit which was introduced 
earlier by the authors. The test, which uses the actual ob- 
servations without grouping, is sensitive to discrepancies at the 
tails of the distribution rather than near the median. An il- 
lustration is given, using a numerical example used previously 
by Birnbaum in illustrating the Kolmogorov test. 


1. THE PROCEDURE 


HE problem of statistical inference considered here is to test the 

hypothesis that a sample has been drawn from a population with a 
specified continuous cumulative distribution function F(z). For example, 
the population may be specified by the hypothesis to be normal with 
mean 1 and variance 3; the corresponding cumulative distribution func- 
tion is 


(1) F(z) = / £ fier, 


In practice the procedure really tests the hypothesis that the sample 
has been drawn from a population with a completely specified density 
function, since the cumulative distribution function is simply the integral 
of the density. 

The test procedure we have proposed earlier [1] is the following: Let 
1522S ---+ S2, be the n observations in the sample in order, and let 
u;= F(z,;). Then compute 


1 n 
OD Dip sce DX (25 — 1)[log u; + log (1 — wnj41)] 

j=l 
where the logarithms are the natural logarithms. If this number is too 
large, the hypothesis is to be rejected. 

This procedure may be used if one wishes to reject the hypothesis 
whenever the true distribution differs materially from the hypothetical 
and especially when it differs in the tails. 

Significance points for W,,? are not available for small sample sizes. The 
asymptotic significance points are given below: 

* Work sponsored by Office of Scientific Research, U. 8S. Air Force, Contract AF18(600)-442, 


Project No. R-345-20-7. 
The authors wish to acknowledge the assistance of Vernon Johns in the computations. 
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ASYMPTOTIC SIGNIFICANCE POINTS 








Significance Level Significance Point 





.10 1.933 
05 2.492 
01 3.857 








2. A NUMERICAL ILLUSTRATION 


Birnbaum [2] has considered a sample of 40 observations and applied 
the Kolmogorov statistic to test the hypothesis that the population from 
which the data came was normal with mean 1 and standard deviation 
1/+/6. By this test he found the data were consistent with the hypothesis. 
We have analyzed the same data using (2), obtaining W,?=1.158 which 
is well below the 10 per cent significance point, and we do not reject the 
hypothesis. 

The computation sheet for this calculation had the following columns: 
Vj; V6(z;—1), u;=F(z;), 1 —Un—j+1, log Uj, log (1 —Un—j+1) and —[log 
ujt+log (1—un_j4:)]. The operation u;=F(z;) is simply finding the prob- 
ability to the left of »/6(z;—1) according to the standard normal distri- 
bution. 

Another test procedure uses the Cramer-von Mises « criterion given by 


1 n 2j — 1\2 
(3) (u - ) . 
j=l 2n 
The asymptotic distribution of this statistic is given in [1]. For Birn- 
baum’s data we obtain nw?=.1789, which is also well below the 10 per 
cent asymptotic significance point of .3473 
In these two examples we have used the asymptotic percentage points 
instead of the actual ones based on finite sample size. Empirical study 
suggests that the asymptotic value is reached very rapidly, and it ap- 
pears safe to use the asymptotic value for a sample size as large as 40. 
Application to the same data of the usual x? criterion of K. Pearson, 
using 8 categories each with expected frequency 5, shows that x?=6.4 
which with 7 degrees of freedom is not significant at the 10 per cent level. 


3. DERIVATION OF THE CRITERION 


Several test procedures are based on comparing the specified cumula- 
tive distribution function F(x) with its sample analogue, the empirical 
cumulative distribution function 
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no. of 7; S z 
F,,(2) = . 





The present writers suggested [1] the use of the criterion 


(4) fanf [Fe - FOWe@are) 


where ¥(u) is some nonnegative weight function chosen by the experi- 
menter to accentuate the values of F,,(x) —F(x) where the test is desired 
to have sensitivity. The hypothesis is to be rejected if W,,? is sufficiently 
large. When ¥(u) =1 this criterion is n times the w criterion. 

The criterion W,? is an average of the squared discrepancy 
[F.(x) —F(x)}*, weighted by y[F(zx)] and the increase in F(x) (and the 
normalization n). If one wishes the test to have good power against 
alternatives in which H(z), the true distribution, and F(x) disagree near 
the tails of (x), and to this end is willing to sacrifice power against 
alternatives in which H(z) and F(x) disagree near the median of F(z), 
it seems that one ought to choose ¥(u) to be large for u near 0 and 1, 
and small near w=}. Even if the alternative hypotheses are closely 
delineated, however, it appears difficult to find an “optimum” weight 
function ¥(u). For a discussion of the general nature of power of distribu- 
tion-free tests, see, for example, Birnbaum [3] and Lehmann [4]. 

For a given value of z, F,(x) is a binomial variable; it is distributed 
in the same way as the proportion of successes in n trials, where the 
probability of success is H(x). Thus, E[F,(x)]=H(z) and 


(5) nE[F,(x) — F(2)]* = nE[F.(x) — H(z)}* + n[F(a) — H(x)}? 
= H(x)(i — H(x)] + »[F(z) — H(z)|*. 


Under the null hypothesis (H(x) =F (x)), the variance is F(x)[1—F(z)]. 
In a sense, we would equalize the sampling error over the entire range 
of x by weighting the deviation by the reciprocal of the standard devi- 
ation under the null hypothesis, that is, by using 


1 
6 u) = —————_ 
(6) Sexson 
as a weight function. This function has the effect of weighting the tails 
heavily since this function is large near u=0 and u=1. It is this weight 
function (6) which we treat in the present note. 
Formula (2) is obtained by writing (4) as 
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iz [F.(x) — F(x)]? 
-» F(x)[1 — F(z)} 
7% FX) 
JL. F(a) [1 - F(@)] 





dF (zx) 


1 
oa . = 
nr 


a [Fa(z) — F(z)? 
atte f F.(@)[1 — F@)] 


* [1 — F(x)]? 
aati fi F@t-F@l 


and letting F(x) = u(dF (x) =du). Straightforward integration and collec- 
tion of terms gives (2). The formula (2.5) in [1] cannot be used directly 








dF (zx) 





here, for that formula requires that f ¥(u)du < ©, which is not true 
0 
of (6). 


4. COMPUTATION OF THE ASYMPTOTIC SIGNIFICANCE POINTS 


It was proved in [1] that the limiting characteristic function of W,? 
defined in either (2) or (4) is 





— 2nit 





(7) o(t) = lim E(e**s*) = 
ey cos (<vi = si) 


and that the inversion of this characteristic function gave for the limiting 
cumulative distribution of W,? the expression 


v2 & (- )TG + HA +1) 


2 j=0 j! 


e—(4i+1)**) / (82) 


(8) 





Lo} 
‘ f e2/[8(w*+1)]—[ 4+) *#*w}/ (82) . day. 
0 


The terms of this series alternate in sign and the (j+1)st term is less 
than the jth term, 721; thus the error involved in using only 7 terms of 
this series is less than the (j+1)st term for 720. By using the fact that 
e*/(8(w*+)] <ee/8 one can easily verify that to compute the prob- 
abilities correctly to four decimal places, one needs only the 0-th term 
for the first two significance points and the 0-th and 1-st terms for the 
third significance point. The laborious part of the computation is the 
evaluation of the integral. Let [(4j+1)x/2/z)]}w=y; then the inte- 
grand is f(y)e”. The y-axis was divided into intervals according to the 
integral e~'v’ and numerical integration was performed. 
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The moments of the asymptotic distribution are fairly easy to obtain 
from formulas given in [1]. The first two are 
° 1 
lim E(W,*) = E(W,*) = 2, ———~ = 1, 
ne ~ jG + 1) 
lim Var (W,?) = 2 > bales = x (x? — 9) ~ .57974. 
n—re0 m 27G+1)? 3 


The asymptotic significance points are computed to assure the prob- 
abilities (significance levels) to be correct to four decimal places. 


REFERENCES 


[1] Anderson, T. W., and Darling, D. A., “Asymptotic theory of certain ‘good- 
ness of fit’ criteria based on stochastic processes,” Annals of Mathematical 
Statistics, 23 (1952), 193-212. 

[2] Birnbaum, Z. W., “Numerical tabulation of the distribution of Kolmogorov’s 
statistic for finite sample size,” Journal of the American Statistical Associa- 
tion, 47 (1952), 425-41. 

[3] Birnbaum, Z. W., “Distribution-free tests of fit for continuous distribution 
functions,” Annals of Mathematical Statistics, 24 (1953), 1-8. 

[4] Lehmann, E. L., “The power of rank tests,” Annals of Mathematical Statis- 
tics, 24 (1953), 23-43. 





UNIVARIATE TWO-POPULATION DISTRIBUTION-FREE 
DISCRIMINATION 


Davin S. STOLLER 
University of California at Los Angeles* 


A distribution-free procedure for classifying a univariate 
random variable, z, into one of two populations on the basis 
of a sample of size N, in which m members are classified into 
one population and the remaining (N—m) into the other, is 
given as follows: Let t(z) = k(z) —h(z), where k(z) is the num- 
ber of observations from the first population which are less 
than z and h(z) is similarly defined for the second population. 
If zS¢*, where ¢* is that value of z for which ¢(z) is a maxi- 
mum, classify z into the first population, otherwise into the 
second. The probability of correct classification, and its esti- 
mate, [V—m-+41t(¢*)]/N, both converge in probability to the 
maximum attainable probability of correct classification. 


1. INTRODUCTION 


interest. A group of N students take an aptitude test for a certain 
course, and receive scores 21, - - - , zw. At the end of the course, all stu- 
dents are classified into two groups, “superior” and “inferior,” on the 
basis of final grades or any other criteria. Another student takes the 
aptitude test and gets a score zy 1. It is desired to classify this student 
as “superior” or “inferior.” This will be done on the basis of selecting a 
discriminating score, {*, based on the previous N scores, and classifying 
the student as “superior” if zy41>¢* and “average” if zv4:S¢*. 

When there is complete a priori knowledge about the distribution 
functions and the relative frequency of occurrence of the two groups, 
Hoel and Peterson [6] have shown how to find an optimum discriminating 
point, ¢, by maximizing the probability of correctly classifying zy4+1. If 
the relative frequency of the two groups is not known, but there is other- 
wise complete knowledge, Anderson [1] and Welch [9] have shown how 
to find an optimum discriminating point by a minimax procedure. 

If, further, there exists only partial a priort knowledge about the prob- 
ability distributions, specifically, if the functional forms of the distribu- 
tions are known but the parameters and relative frequencies are unknown, 
then, under certain restrictions, estimating the optimum discriminating 
point by replacing unknown parameters with their maximum likelihood 
estimates is an asymptotically optimum procedure [6]. 


4 ew following example typifies a discrimination problem which is of 
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A distribution-free procedure for the case where there exists no a priori 
knowledge of the parameters or form of the distribution functions has 
been investigated by Fix and Hodges [4], whereby zw, is classified into 
one group or another according to whether the sample values, z;, “closest” 
to zv41 are mostly in one or another group. In [4], consistency properties 
are proved about the probability of misclassification induced by this 
procedure. The small sample behavior for some special cases is considered 
in a later paper [5]. 

The present paper proposes a distribution-free procedure for the uni- 
variate two-population case, together with an estimate of the probability 
of correct classification. It is shown that (1) the estimate of the probability 
of correct classification is a consistent estimate of the optimum prob- 
ability of correct classification, (2) the probability of correct classification 
induced by this procedure converges in probability to the optimum prob- 
ability of correct classification. 


2. STATEMENT OF PROBLEM 


Let II be a composite univariate population, in which II, and Ilz are 
sub-populations with cumulative distribution functions F;(z) and F.(z). 
Let @ be the probability that z, a random member of II, is a member of 
Il, i.e., 2 is a random variable defined by the cumulative distribution 
function, 6F',(z)-+(1—6)F;2(z). 

A random sample, 21, - - - , Zyv, is taken from II, in which m of the z; 
are identifiable as members of II;, and the remaining N —m as members 
of Ig. Another sample value, zyv+:, which is unidentifiable, is obtained at 
random. Without a priori knowledge of @ or of the functional forms or 
parameters of F',(z) and F,(z), it is desired to: 


(1) Classify zy; as a member of IT, or I». 
(2) Estimate the probability that 27,1 has been correctly classified. 


The functions F;(z) and F(z), however, will be restricted to be such 
that (1) F(z), F2(z) are absolutely continuous, so that the probability of 
tied sample values is zero), and (2) the optimum discrimination scheme 
consists of classifying zy; according to whether zy41Sf or zw41>6, 
where ¢ is a unique point. An optimum discrimination scheme is here 
defined as one that maximizes the probability of correct classification. 

The probability of correct classification when an arbitrary point, z, is 
used to classify zy41 by the above rule is given by: 


Q(z) = OF,(z) + (1 — 6@)[1 — F:(z)]. 


By definition, Q(z) achieves its maximum*at 2=¢. 
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3. A DISTRIBUTION-FREE DISCRIMINATION PROCEDURE 


An estimate of the discriminating point, ¢, will be made by first con- 
sidering a distribution-free estimate of the function, Q(z). This estimate 
is formed by replacing @ by m/N, and F(z) and F,(z) by the appropriate 
step functions, as follows. 

Let 2, - - - , 2m be the sample members from Ij, ordered by magnitude, 
and y:, -- +, Yyn—m the similarly ordered members from II;. Define the 
step functions: 


Sa (z) = k/m, tm S 2 < ten; &k=O0,---,m, 
Sy—m(z) = h/N — m, Yr SZ < yas; h=0,-:--,N—wm, 


where 


Zo™= Yo = —%; Imti = Yn—mii = + ©. 
Then a distribution-free estimate of Q(z) is defined by 
Q(z) = (m/N)Su(2) + [(N — m)/N][1 — Sy-m™(2)] 


[(N — m) + (k — h)]. 


1 
N 
(Note that 0<Q(z) $1.) Take any value of z that maximizes Q(z), say 
z={*. Then {* is an estimate of the optimum classification point, ¢, and 
Q(¢*) is an estimate of the probability of correct classification of the 
scheme: Classify z+; as a member of I; if zv,:S{*, otherwise as a mem- 
ber of IIe. 


4. ILLUSTRATION 


To illustrate the procedure, consider the following example. A class of 
25 beginning algebra students took a test on elementary algebraic opera- 
tions after two weeks of instruction. At the end of the course, the instruc- 
tor classified the 25 students into two groups: “inferior” and “superior.” 
Ranking the students by means of their two-week test scores, the follow- 
ing ordering resulted, where the scores of the 8 “superior” students are 
indicated by italics: 

50, 51, 574, 63, 64, 68, 72, 73, 74, 744, 75, 754, 76, 
764, 77, 78, 81, 82, 84, 85, 86, 874, 89, 91, 92. 
To this corresponds the following ordering of the z’s and y’s: 


U1, U2, Ug, La, Le, Ve, L7, Te, Yi, Vo, Y2, Tio, Ys, 


Ti, 712, 13, T14; 15, Ya, Ys, Ye, T16, ¥7; Ys, T17. 
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Notice that (k—h) may be computed very rapidly by the following 
procedure. At 2%, (k—h)==0. Proceeding along the ordered 2’s and y’s, 
add one whenever an z is encountered, subtract one whenever a y is en- 
countered. In the example, it can be seen that (k—h) attains a maximum 
(of 12) in the interval from 25 to ys4, where 82 z< 84. Therefore, any 
point in this interval, say ¢*=83, yields a discriminating procedure. An 
estimate of the probability of correct classification induced by ¢* =83 is 
Q(83) = 20/25, where the numerator is equal to (VN —m) plus the maxi- 
mum value of (k—h) and the denominator is N. 

When applying the procedure described above, it may happen that 
two or more intervals exist in which (k—h) is maximized. It is subse- 
quently proved that any point which maximizes (k—h) possesses asymp- 
totically optimum characteristics. From a large sample point of view, 
therefore, when more than one maximizing interval is encountered, any 
point from any one of the maximizing intervals may be selected as ¢*. 
In small samples, as a practical consideration, one should select an 
“average” value of ¢*, say the average value of the midpoints of 
the maximizing intervals. The “average” value of ¢*, as compared to 
any one value of {*, will possess a more stable sampling behavior for small 
sample sizes, and will have the same asymptotically optimum character- 
istics discussed subsequently. 

It may also happen that tied sample values will occur, due either to 
rounding off of observations or to a prior: discreteness of the populations. 
In that category of ties which does not affect the calculation of {*, the 
tied sample values may be ranked arbitrarily. When the calculation of 
{* is affected by the occurrence of one or more sets of ties due to rounding 
off of observations, each critical set of ties in each population may be 
distributed uniformly over the round-off range of the observations, and 
then ordered accordingly, For example, in the sample: 


1, 1, 1, 2, 3, 4, 4, 4, 4, 4, 5, 6, 7, 


the tied sample values (1, 1, 1), may be ranked arbitrarily since, by 
inspection, this does not affect the value of ¢*. The set, (4, 4, 4, 4, 4), 
which does affect the calculation of ¢*, contains two observations from 
Ii, which when distributed uniformly over the round-off range, 34 to 
43, are assigned the values, (38, 44). Likewise, the set, (4, 4, 4), is 
assigned the values, (33, 4, 44). 

When the populations are a priori discrete, and critical sets of tied 
values occur, k(z) may be redefined as the number of observations from 
II; less than or equal to z, and A(z) as the number of observations from 
II, less than or equal to z. Then k(z) —h(z) will be uniquely defined even 
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when ties occur. In the example above, treating the sample values as 
a priori discrete, k(1) —h(1) =1, k(4) —h(4) =2, and k(z) —h(z) is max- 
imized in the interval, 3<2<4. 

In Section 1, the criterion for determining an optimum classification 
scheme is based on maximizing the probability of correctly classifying 
one observation whose population membership is unknown. It should 
be noted that a classification scheme which is optimum in this sense 
for one unidentified observation is also optimum for a group of r inde- 
pendent, unidentified observations. For, if p represents the probability 
that one observation is correctly classified, then p’ is the probability 
that the entire group is correctly classified, and the latter probability 
is maximized when p is maximized. 


5. DISTRIBUTION OF Q(z) 

For fixed m, A and k are independent binomial variates of means 
mF ;(z) and (N—m)F,(z) respectively. Also, m is a binomial variate of 
mean N@, therefore it can be shown (by use of conditional expectation) 
that, for each z, 


E[Q@)] = Q@) 


Thus, for each z, Q(z) is an unbiased estimate of Q(z). It can also be 
shown that 


Var [Q@)] = ¢/N, 
where c is a constant depending on 6, F,(z), F2(z), but not on N. 


Since 
0, 


(2) 


lim % 
N=20 


by a Tchebychev inequality (see Cramér [2], Theorem 20.4) it is seen 
that, for each z, Q(z) is also a consistent estimate of Q(z). 


6. SOME PROPERTIES OF THE CLASSIFICATION PROBABILITY 
ESTIMATE, Q({*) 


It is readily seen that Q(¢*), the estimate of the classification prob- 
ability induced by the point estimate, ¢*, is non-negatively biased, 
since Q({*) => Q(¢), and thus 


E[Q(*) — Q)] 2 0, 
from which, 


E[Q(**)] = Q(). 
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An example (suggested by a referee) of the magnitude of the bias 
for small samples is given by the special case: 6= 4; F1(z) = F2(z). Here 
Q(¢) =}, but, by definition, Q(¢*) 2 3. Further, if m=N/2, 


Qs*) =4 +3 max (k/m — h/m) 


and 
Pr {Q(*) — Q() > B} = 


Pr { |Q(s*) — Q(¢)| > B} = 
Pr {max |k/m — h/m| > 28}, 


the latter probability having been tabulated by Massey [8]. For two 
equal samples of size 10 from the same population, Pr {Q(¢*) 2.7} 
equals about .16, and Pr{Q(¢*) 2.75} equals about .05. 

However, Q({*) is a consistent estimate of Q(¢), for given ¢, 7>0, 
by sec. 5, for sufficiently large N, 


Pr {Q(¢) < Q() —€} <n 
and since Q(¢*) =Q(¢), , 
Pr {Q(¢*) < Q@) — e} <2. 


Now, 


Oe) — AH) = a — ot + max f < “| 


— max { OF 1(z) wctiien 6) F2(z) } < ‘0 — “ 


+ man {— _ oF,() | _ E -(1- orate) | 


<|o ml + —_— 
. eo — Of 


+ amet {~ -(1- oF) i 


Since m/N is a consistent estimate of 0, for N>N’(e, n), the in- 
equality, 
Pr {|@—m/N|>e} <2, 
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is satisfied. Now, 
k , 
man \- _ oF (2) 
k k k 
S max \~ — 6 — + max ? —_—- oF) 
2 N ™m 2 m 
m 
V - | + ow 
Consider the expression, 
max | k/m — F,(z) , 


k 
— — P,(s)|. 
m 


lA 











with m temporarily held fixed. Now 
Vm max | k/m — Fi(z) | 


is well known (Kolmogorov [7], Feller [3]) to possess an asymptotic 
probability distribution function with finite mean and variance. There- 
fore, for any fixed m>M(e, 7/2), 


k 
— — F,(z) 


Pr { max 
z m 








> d < 7/2. 
Now for N>N’’"(M), Pr{m<M} <n/2, therefore for N>N’’(M), 


k 
a F(z) 


> < 9. 
m 


Pr { max 








A similar N’’’(e, n) exists for 


. 
? 


h 
a F(z) 
™ 





max 
2 





thus for N>max (N’, N”’, N’’’), 
Pr {Q(¢*) > Q(t) + 5e} <2. 


7. AN ASYMPTOTICALLY OPTIMUM PROPERTY OF THE DISCRIMINATING 
POINT ESTIMATE, {* 


It can also be shown that Q(¢*), the actual (but unknown) classifica- 
tion probability resulting from the use of the discriminating point 
estimate, {*, converges in probability to Q(¢), the optimum classifica- 
tion probability, ie. for sufficiently large sample sizes, Q({*) is arbi- 
trarily close to Q(¢) with probability artitrarily close to one. For, 


| ac*) — A&®) | s | QG*) — Qe") | +/ AE) - AH) |. 
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By sec. 6, for N>N(e, ), 
Pr { | Q(¢*) — Q(¢)| > «} <2. 


Now, 
| ac*) — O¢*)| < 2 ma | + ore") “ =| 
N N 
+|a ~ oFi(t*) — =|: 
N 


Examining the second term, 


aro") - =| = lors) — 0—| +0 — - = 
Oo) gh [FF m fe N 


k @ 1 
< 6 max rie) -—|+1 —-— 
z m m N 


k m 
S max | F,(z) -—|+|@-——}]. 
z ™ N 


Consequently, in a manner similar to the discussion of sec. 6, it can be 
shown that there exists an N(e, 7) such that for N>WN(e, 7), 


Pr { | Q(¢*) — Q(¢) | > e} <2. 
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USE OF NORMAL PROBABILITY PAPER 


HERMAN CHERNOFF AND GERALD J. LIEBERMAN 
Stanford Universiiy* 


Normal probability paper is so designed that the cumulative 
distribution function of a normally distributed chance variable 
appears as a straight line. It is a common practice to plot the 
observations of a sample on this paper to obtain a graphical 
check for normality or to obtain a graphical estimate of the 
mean and variance of the population. Textbooks, however, 
are not very specific about methods for plotting, for, although 
the ordered observations are plotted along the abscissa, some 
uncertainties about the corresponding ordinates are left un- 
resolved. The purpose of this paper is to indicate, with a spe- 
cial example, that any graphical technique should depend to a 
large extent on the purpose for which the graph is drawn. In 
particular, it presents tables covering sample sizes up to 10, 
for selecting the ordinates on normal probability paper so as 
to obtain “optimum” graphical estimates of the mean é and 
the standard deviation o of a normal distribution. The some- 
what more complicated problem of selecting the ordinates to 
obtain an “optimum” test for normality is not discussed. 


1. UNBIASED ESTIMATES OF AND o 


Y MEANS of a non-linear transformation of the vertical scale on the 

graph of the cumulative-normal-distribution curve, it is possible 
to transform this curve to a straight line. Graph paper possessing this 
property is known as normal probability paper. The abscissa scale 
corresponds to the values of a normally distributed chance variable, 
whereas the ordinate scale represents a number, p, between 0 and 1. 
Neither 0 nor 1 appears on the ordinate scale. 

If a sample of n independent observations is to be plotted on normal 
probability paper, it is natural to arrange them in ascending order, 
i.€., UiSU2S,--+ Sua, and to plot a point corresponding to each 
observation. One such plot is (ui, 1/n), (ue, 2/n),--+, (Un, n/n). 
However, it is evident that the last point does not appear on the graph. 
Furthermore, the symmetry of the normal distribution suggests that 
u, and u, be treated in a “symmetric” fashion. Two alternative plots are 


( ) ( ) ( : ) 
Rae REs ’ Rg Sy eg Un, 
n+1 n+1 n+1 
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=< U2, —}> Me, 3) ° °° 9 Obes . 
2n 2n 2n 2n 


Since there is no obvious rationale for preferring one of these plots to 
the other, there arises the problem of selecting an “optimum” method 
of plotting. 

Let us consider the problem of graphically estimating the mean ~ and 
standard deviation o of a normal population on the basis of a sample. 
Once the points (wi, pi), (U2, p2),--*, (Un, Pn) are plotted, a method 
which suggests itself is to fit a straight line visually to the points and to 
take the abscisa where the line intersects p= .5 as an estimate of the mean, 
and the distance between the abscissas where the line intersects p= .8413 
and p=.5 as an estimate of the standard deviation. Let us assume that 
the visually fitted line 1s a very good approximation to the line that would be 
obtained by minimizing the sum of squares of the horizontal deviations from 
the line! The problem then is to find what values of pi, po, +++, Dn 
yield good estimates, ¢ and é, of the mean £ and standard deviation o of 
the normal population sampled. Since p is not represented on a linear 
scale, we shall transform to v=v(p) which is related to p by 


and 


eke 
(1) p= f y e~*"!2dx. 
-~oV2r 


In terms of the ordinate v, the fitted straight line may be represented by 
(2) u=€+ ov 


where ~ and é are the estimates of ¢ and o. If v;=v(p,), i=1, 2,---,n, 
these estimates are 


(3) =a — 60 


and 


é= > (u; — a)(v; — 5)/ > (v; — 0)? 


t=] 


(4) : a 
Dd uv; — 8)/ ) @ - 8). 


i=l t=1 





1 The use of horizontal deviations is suggested by the fact that the p; are not chance variables and 
the u; are, 
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If we require that ¢ be unbiased and ¢=0, we must have =0. In that case 
= =a which is the mean of the sample. In fact, it is the optimum estimate 
of ¢ from many points of view. 

The estimate of o may be represented by a linear function of the or- 
dered observations wu, U2, + > * , Un, 1.€., 


(5) ¢ = > CU; 
inl 
where 


vv; — D 


> (v; — 0)? 


t=1 


(6) Ci 





, 


Let us define an optimum choice of the v; to be one which minimizes the 
variance of é subject to the condition that é is an unbiased estimate of c. 
We first note that if each v; is increased by a constant, the c; are un- 
affected. Hence we may select the v; so that s=0 without interfering with 
the optimum choice of v; for estimating ¢. Next we note that the class of 
al! unbiased linear estimates of the ordered observations are available by 
suitably varying the v;. The reason for this is that the only condition 


imposed by equation (6) on the c; is >-,.,"c;=0. This condition is implied 
by unbiasedness since 


(7) B( > cau) = ko + ( > ce 


t=] t=1 


where k depends on ¢1, ¢2, « - - , “n. It follows that the problem of finding 
the optimum »; is equivalent to that of finding the minimum variance 
unbiased estimate of « which is linear in the ordered observations. 

This problem is one which was treated by Godwin [1]. He presents a 
table of coefficients to be used with the ordered observations to obtain the 
minimum variance unbiased estimate of ¢. His results were transformed 
for use in this paper. The “optimum” values of p; can be found in Table I. 


2. BIASED ESTIMATES OF ¢ 


A peculiarity of the problem of estimating ¢ is that by introducing a 
slight bias we are able to obtain a better estimate in the sense of mini- 
mizing the mean square deviation from c. 

Suppose ¢ is an estimate of @ where E(é)=6. Consider the statistic at 
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for which a is defined such that E{(at—6)?} is a minimum. In other 
words, we minimize the expression 


E{ [(at — a8) + (a — 1)6]?} = a%o,? + (a — 1)%6? 
with respect to a, where o/? is the variance of ¢. This minimum occurs at 
62 
a= ———_ 
62 + o;? 
It then follows that 


1 
1 1 
rr 


E{(at — @)?} = 
a? 

In the problem under consideration, ¢ is a linear function of the ordered 
observations whose coefficients add up to zero. Hence o? (for the given 
sample size) is a multiple of o?, say, k,o?. Furthermore, @=«, and there- 
fore, the optimum a is given by 


1 
a= . 
1+k, 
The corresponding mean square deviation about a is given by 


7 
k, 


o 


E{ (at — o)?} = 


Consider two unbiased estimates, linear in the ordered observations, 
which have variances k,o*? and k,*o* where k,<k,*. The estimate with 
variance k,o* yields the better biased estimate (of the above type) since 


1 1 
< 
Bae —- ee 
kp ka* 





Hence multiplying the c; contained in the solution of the problem in the 
preceding section by an appropriate factor, which is equivalent to di- 
viding each v; by a, leads to the estimate which is optimum in the follow- 
ing sense. Among all estimates which are linear in the ordered observa- 
tions and whose bias is independent of ¢, it has minimum mean square 
deviation from ¢. 
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3. TABLES 
In Table I the p; values corresponding to the following are presented: 


1. the best linear (in the ordered observations) unbiased estimate of c, 
2. the best linear (in the ordered observations) biased estimate of c, 


a 





3. = b] 
P n+1 
and 

1 
5 
2 

4. ~~ Ss 
n 


for values of n= 1(1)10. 


TABLE I 


COMPARISON OF THE ORDINATES (p;) USED ON NORMAL 
PROBABILITY PAPER FOR ESTIMATING THE 
MEAN AND STANDARD DEVIATION 









































n Pr P2 Ps Ps Ps 
2 (1) . 28632 
[2] .18775 
[3] .33333 
[4] . 25000 
3 {1] . 19870 . 50000 
[2] . 14020 . 50000 
[3] . 25000 . 50000 
[4] . 16667 . 50000 
4 {1] .14913 . 40034 
[2] . 10982 . 38288 
[3] . 20000 -40000 
[4] . 12500 .37500 
5 [1] .11775 .33333 . 50000 
[2] .089398 .31272 . 50000 
[3] . 16667 . 33333 .50000 
[4] . 10000 . 30000 . 50000 
6 [1] .096373 . 28489 .42964 
[2] .074907 . 26485 .42229 
[3] . 14286 . 28571 .42857 
[4] .083333 . 25000 -41667 
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n Pr P2 Ds Ps Do 
zg [1] .080456 . 24824 .37660 . 50000 
[2] .063732 . 22986 . 36624 .50000 
(3] . 12500 . 25000 .37500 . 50000 
[4] .071429 . 21429 .35714 .50000 
8 {1] .069624 . 21959 .33500 .44544 
[2] .056027 . 20289 .32348 .44139 
{3} .11111 . 22222 .33333 44444 
[4] .062500 .18750 .31250 .43750 
9 {1] .060607 .19659 .30146 .40162 .50000 
[2] .049425 .18158 . 28978 .39537 . 50000 
[3] . 10000 . 20000 .30000 .40000 . 50000 
[4] .055556 . 16667 .27778 . 38889 . 50000 
10 {1] .053568 .17773 . 27386 .36559 .45537 
[2] .044192 . 16422 . 26245 .35818 .45281 
[3] .090909 .18182 . 27273 . 36364 .45455 
[4] .050000 . 15000 . 25000 .35000 .45000 





Nore: When i >n/2, use pj =1 —pn_ix. 

[1] These values of p; correspond to the ordinates that yield the minimum variance unbiased 
estimate of « which is linear in the ordered observations. 

(2) These values of p; correspond to the ordinates that yield the biased estimate of ¢ which has 
minimum mean square deviation from ¢ and which is linear in the ordered observations. 

[3] These values of p; correspond to i/(n+1). 

[4] These values of p; correspond to (i — 4) /n. 

Table IT presents the mean square deviations from ¢ of the estimates 
which are linear in the ordered observations, the variance of the minimum 
variance non-linear unbiased estimate of o, and the mean square deviation 
from o of the non-linear biased estimate having minimum mean square 


deviation. The minimum variance non-linear unbiased estimate of o[1] is 


1 
| "Gs i : = (2, — 2)? 


and the corresponding biased estimate is 


) 
fe Ena} 











r(>o — 1) 


aG) 
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This latter estimate is optimum in the sense that among all estimates 
having an expected value which is a multiple of o and a variance propor- 
tional to o?, it has minimum mean square deviation from c. This estimate 


TABLE IfI 


COMPARISON OF THE MEAN SQUARE DEVIATIONS 
FROM o@ OF VARIOUS ESTIMATES OF oe 








[1] [2] [3] [4] (5) (6) 


3 





.57080 .57084 . 36338 . 36340 1.07533 .42611 
. 27324 . 27549 . 21460 - 21599 - 49856 . 22649 
. 17810 . 18006 -15117 . 15259 .31559 . 15558 
.13177 13332 - 11643 . 11764 .22751 11872 
. 10447 - 10571 .09459 -09560 . 17630 -09605 
-08650 .08714 .07961 -08015 - 14306 -08067 
.07379 .07469 -06872 -06950 . 11987 .06954 
.06432 -06501 .06044 -06105 . 10283 -06111 
.05701 .05759 .05393 .05445 -08981 .05449 


SCO OND oP WDD 


— 























[1] Variance of the minimum variance non-linear unbiased estimate of ¢. 

[2] Variance of the minimum variance unbiased estimate which is linear in the ordered observa- 
tions. 

[3] Mean square deviation from ¢ of the non-linear biased estimate which has minimum mean 
square deviation. 

[4] Mean square deviation from ¢ of the biased estimate which is linear in the ordered observations 
and which has minimum mean square deviation. 

[5] Mean square deviation from @ of the biased estimate based upon the ordinates i/(n+-1). 

[6] Mean square deviation from o of the biased estimate based upon the ordinates (i — 4) /n. 


is referred to above and subsequently as the non-linear biased estimate 
having minimum mean square deviation from ¢. 

In Table III these mean square deviations are transformed to efficien- 
cies. For the unbiased estimates the ratio of the variances are computed; 
for the biased estimates the ratio of the mean square deviations from ¢ 
to the minimum are computed. 

It is evident from Tables II and III that the optimum choice of the p; 
depends upon whether an unbiased estimate is necessary or whether a 
biased estimate can be tolerated. In either case, the graphical estimates 
compare very favorably with the optimum estimates of the standard 
deviation. For n <10 the efficiency of the optimum unbiased graphical 
estimate relative to the optimum unbiased estimate is greater than 98 
per cent, as is also the efficiency of the optimum biased graphical estimate 
relative to the optimum biased estimate. 
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TABLE III-A TABLE III-B 


EFFICIENCY OF THE COMPARISON OF EFFICIENCY OF 
OPTIMUM UNBIASED VARIOUS BIASED ESTIMATES OF 
ESTIMATE OF o o (RATIO OF MEAN SQUARE 
(RATIO OF DEVIATIONS FROM a) 

VARIANCES) 














(1) 


100.00 
99.19 
98.92 
98.84 
98 .83 
98.86 
98.90 
98.94 
98.99 


3 


[2] [3] [4] 


3 








99.99 33.79 85.28 
99 .36 43 .04 94.75 
99 .07 47 .90 97.17 
98 .97 51.18 98.07 
98.94 53 .65 98.48 
99 .33 55.65 98 .68 
99.88 57 .33 98 .82 
99.00 57.78 98.90 
99.04 60.05 98 .97 














Coe anon WD 
COON QA WD 


cos 
ee 








[1] This entry is the ratio of the variance of the minimum variance non-linear unbiased estimate 
to the variance of the minimum variance unbiased estimate which is linear in the ordered 
observations, i.e., columns [1]/(2] in Table II. 

(2] This entry is the ratio of the mean square deviation from ¢ of the non-linear biased estimate 
having minimum mean square deviation to the mean square deviation from ¢ of the minimum 
mean square deviation biased estimate which is linear in the ordered observations, i.e., columns 
[3] [4] in Table II. 

(3] This entry is the ratio of the mean square deviation from ¢ of the non-linear biased estimate 
having minimum mean square deviation to the mean square deviation from ¢ of the biased 
estimate based upon the ordinates 1/(n +1), i.e., columns [3]/[5] in Table II. 

[4] This entry is the ratio of the mean square deviation from o of the non-linear biased estimate 
having minimum mean square deviation to the mean square deviation from ¢ of the biased 
estimate based upon the ordinates (i — 4) /n, i.e., columns [3] /[6] in Table II. 
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ANALYSIS OF SIMPLE LATTICE DESIGNS WITH 
UNEQUAL SETS OF REPLICATIONS* 


Paut MEIER 
The Johns Hopkins University 


INTRODUCTION 


HE lattice, or pseudo-factorial, designs first introduced by Yates 

[11] and various generalizations of them [7, 8] have proved to be 
quite useful, particularly in agricultural applications. These designs 
are suited to experimental situations in which there are a large number 
of varieties, treatments, or what have you, to be compared under 
conditions which require a relatively small block size. 

Among the wide variety of available incomplete block designs the 
lattices are distinguished by the fact that they combine a relatively 
simple type of analysis with a fair degree of flexibility in the choice of 
the number of varieties to be tested, the number of replications to be 
used, and so forth. 

In most of these designs the basic construction consists of p replica- 
tion patterns. The instructions for analysis, such as given in [2], allow 
for any number of repetitiuns of any subset of this basic collection of 
replication patterns. Thus if we use p’ patterns from the basic collec- 
tion and repeat this subset r times we have a design with a total of 
rp’ replications. For example, if we wish to compare 25 varieties in a 
5X5 lattice design we have available a basic set of 6 patterns. An ex- 
periment with 6 replications may be designed using all six patterns 
once, using each of three patterns twice, or using just two patterns 
each repeated three times. (See Cochran and Cox [2, p. 281].) However, 
if we wish to use 5 replications our choice would be limited to the quin- 
tuple lattice, and the case of 7 replications is not covered at all. The 
possibilities are even further restricted in the case of a 6X6 latticc 
for which the basic set includes only 3 patterns. 

Even when the above prescription can be followed it may not be the 
most desirable procedure. The more patterns from the basic set used, 
the more tedious becomes the analysis, so that in situations where the 
complexity and cost of calculations weigh heavily the designs using 
more patterns, although generally more balanced, may lose in favor. 
Conversely, it may be possible to achieve a more nearly balanced de- 





* This paper is a revision of parts III and IV of a doctoral thesis submitted to Princeton Uni- 
versity. Preparation of this paper was assisted by a contract with the Office of Naval Research. Paper 
No. 293 from the Department of Biostatistics, School of Hygiene and Public Health, The Johns 
Hopkins University. 
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sign by dropping the requirement that each pattern be repeated the 
same number of times. 

Finally, although it may not be a frequent occurrence, there are 
unfortunate occasions on which for one reason or another essentially 
all of one or more replications is lost, thus destroying the symmetry of 
the original arrangement. 

For these reasons it may be of interest to note the extent to which 
the ordinary analysis for lattice designs must be modified for an experi- 
ment in which the p’ patterns from the basic set are repeated with 
unequal frequencies. 

In this paper we investigate the simplest case, the simple square 
lattice, which uses only two replication patterns. Following the usual 
convention replications following one pattern are called X-replications 
and those following the other pattern are called Y-replications. We will 
refer to a design using n X-replications and m Y-replications as an 
(n, m) design. 

In addition to the question of relative ease of analysis, the efficiency 
of unequal sets designs relative to alternatives is of interest and this is 
considered in section I-G. The criterion used to measure efficiency is 
the reciprocal of the average variance of all possible comparisons be- 
tween varieties, which is a kind of average “information,” or invariance, 
of comparisons. In experiments of identical construction having dif- 
ferent numbers of replications this quantity will be proportional to 
the number of replications. Thus, to compare the efficiencies of two 
designs which use different numbers of replications we use the average 
invariance of comparisons divided by the number of replications. This 
gives us an absolute scale on which to compare any two designs involv- 
ing the same number of varieties. 

It is not claimed that this is an ideal criterion, but it is felt to be satis- 
factory for the purpose at hand, namely, to compare an unequal sets 
lattice design with an equal sets design. We will also compare the (2,1) 
design with the triple lattice and the (3, 2) design with the quintuple 
lattice. Since a 5X5 lattice is generally considered to be the smallest 
to which the recovery of inter-block information should be applied, 
we give numerical calculations for it. The disadvantage of the unequal 
sets lattice will be less in larger designs. 

It will be seen that the analysis for the unequal sets simple lattice 
designs differs to only a slight extent from the usual analysis given for 
the equal sets designs. In view of this simplicity such designs may be 
considered legitimate competitors with the more nearly balanced 
alternatives, provided the relative efficiency is not too low. The maxi- 





788 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 19% 


mum losses of efficiency for the (2, 1) and (3, 2) designs relative to the 
equal sets designs and the triple and quintuple lattices, respectively, 
are given below for the 5X65 lattice. 


MAXIMUM LOSS IN EFFICIENCY FOR A 5X5 SIMPLE 
LATTICE WITH UNEQUAL SETS OF REPLICATIONS 
RELATIVE TO OTHER DESIGNS 








Alternative Design (2, 1) Design (3, 2) Design 





Simple Lattice with Equal Sets 6% 2% 
Triple Lattice 12% — 
Quintuple Lattice —_ 11% 





These maximum losses are realized when block variability is large or 
the intra-block analysis is used. 

The paper is divided into two parts. The first part deals with the 
theory of the analysis and the model behind it. The second part con- 
sists of a numerical example. 


I, DESIGN AND ANALYSIS 
A, Field Design and Mathematical Model 


The simple lattice designs permit the comparison of k? varieties in 
blocks of size k. Thus a complete replication consists of k blocks, each 
containing k varieties. The construction of replication patterns begins 
by arranging the varieties in a square array. In the first, or X, pattern 
those varieties appearing in any one row go into the same block. In the 
second, or Y, pattern those varieties appearing in any one column go 
into the same block. In agricultural work the blocks thus composed 
would be assigned at random to the blocks laid out in a replication, 
and the varieties within a block would be assigned at random to the 
plots within a block. In other types of work the analogous method of 
assignment would be followed. An excellent description of this proce- 
dure is given by Cochran and Cox [2]. 

The mathematical model may be described as follows. Consider the 
varieties in the square array from which the design is composed. De- 
note by v,; the “true” mean of the variety in the ith row and jth column 
(less the over-all mean of the varieties). Denote by x;jr the value ob- 
served for the plot in the rth replication containing the variety v,;. Then 


Lijr = + A; + VW + ae + Bir + €xjr 


where 
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p»=Grand mean 
A,= Replication effect 
v;;= Variety effect 
_ Jith block effect if the rth replication is in the X set 
~ \0 if the rth replication is in the Y set 
_ J jth block effect if the rth replication is in the Y set 
Bir=0 if the rth replication is in the X set ! 
€ijr = Residual effect. 


Qir 


In the analysis that follows we will assume that the ¢;;, may be con- 
sidered to be normally and independently distributed about zero with 
variance o,”. For the intra-block analysis we need no other assumptions 
since block effects are completely eliminated from the varietal com- 
parisons. A direct application of Cochran’s theorem [4] to the analysis 
of variance shows that the residual mean square is a proper estimate 
of error, and by a little additional calculation the analysis can also be 
made to provide an exact test of significance for the null-hypothesis 
that all v;; are equal. 

For the inter-block estimation of varietal effects we must make some 
assumptions about the block effects, a;, and 8;,. Besides the assumption 
of randomness, insured by the method of assigning blocks within repli- 
cations, it is necessary to assume that block variability is the same 
within each replication. To avoid expository clumsiness it is convenient 
to assume, when dealing with the inter-block analysis, that the block 
effects also are normally and independently distributed with variance 
os’. (The calculation of average mean squares requires only that the 
effects be uncorrelated, but this is not sufficient to justify the applica- 
tion of the ¢-distribution to varietal comparisons. Of course, strict 
normality need not be required. It has been shown that for many 
purposes deviations from normality do not seriously affect the analysis 
[5].) The quantities o,* and o,? which appear in the expressions for the 
average mean squares represent the variation due to replication and 
varietal effects. Assumptions about the nature of the distribution of 
these effects or their random allocation have no bearing on either type 
of analysis. Except in discussing the recovery of inter-block informa- 
tion, block effects will be considered fixed rather than random effects. 

As are most of the familiar experimental designs, the unequal sets 
simple lattice is a partially balanced incomplete block (p.b.i.b.) design 
(see discussion in [2]). The application to particular designs of the gen- 
eral method for analyzing a p.b.i.b. design is nicely demonstrated in a 
paper by Nair [10]. However, the inequality in number of X- and Y- 





790 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 19% 


replications in our case gives us a design with three rather than the 
usual two associate classes, and it would appear from this viewpoint 
that the unequal sets case is essentially more complex than the equal 
sets designs. However, the analysis follows directly by application of 
Cochran’s theorem without making any appeal to general results from 
the theory of experimental designs. In this form the close similarity of 
the analysis to that for equal sets is obvious. 

The problem of choosing a suitable notation for exposition is by no 
means simple. Our object is to provide a notation which suggests the 
operations involved without being unduly cumbersome. For the quanti- 
ties which are sums or averages over all possible values of an index we 
use the fairly common convention that a dot (*) replacing an index 
means the average over all possible values of that index, whereas a 
plus (+) replacing an index means the sum over all possible values of 
that index. For example: 


1 
zj.. = TR aX u Vijry Lij+ = u Lijr - 


The only difficulty with this notation is that a mean taken over X 
replications, for example, takes the clumsy form 


1 & 


— z Lijr 


R, r=1 


where we have, say, R,-X-replications and R,-Y-replications. To avoid 
writing such expressions we use the notation %,; for the above, and 
%,; for a mean over Y-replications. It follows that R2;;,= Ri%;;+R2Z;; 
where R= R,+R-z is the total number of replications. 


B. Analysis of Variance 
The analysis! follows from the identity 


be (ise - #...)* Total 


tir 


=k? >> (2... —2...)? Replications 


+R>> (x;. — 2...)? Varieties 
i ignoring blocks 


+ k4 > ze [(xi.5 — 2.4) — (%;. — &..)]? Blocks A 





1 Expressions suitable for computation are given with the numerical example in part II. 
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R 
+d bd [Gey — 2-4) — @y— 2.) ]4 


r=R i+1 j 
kRiR:z 


+ BBR a, — 2.) - ee - 20) 


+ } [(#.; — 2..) — (@; — @..) } 


Blocks B 


(eliminating varieties) 


Residual 


Ri 
+ { > p (Lijr — Laer — Lj. + Dy. + z.; + i..— z...)? 


r=l ij 


R 
+ Zz > (Lier — L.je — Vij. + Zy.- + A Z;. 4 Z..— x..." é 


reRy+1 ij 


It can be verified directly that the sums of cross products such as 
D> (ter — 2...) (Bij. — 2...) 
‘jr 
are all zero. From this it follows that the rank of the quadratic form is 
equal to the sum of the ranks of the various components. It follows 
from Cochran’s theorem [4] that the component sums of squares are 
distributed independently. With the exception of the residual terms 
the ranks of the quadratic forms can be seen directly, and the rank of 
the residual may be obtained by subtraction. 
Also, excepting the residual, the average mean squares are easy to 
determine. For example 


Ave {i u (t..5 — x..." 


= Ave {i [> (A,—A.) + (a.-+ B.- —a.. — B..) + (6.4 - eat. 


As the various effects are assumed to be independent, the average 
values of the cross terms are zero and the expression becomes 


Ave {x (A, —A.)?+ >> (a+ + Bs — a. — B..)? 


+ u (e.. — eo 


R-1 R-1 
(oor ; ) oor + e ) ae 
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and the average mean square for replications is k*?/(R—1) times this 
expression. In the same manner we find the other average means 
squares, as usual, obtaining the residual by subtraction. The result of 
these calculations may be tabulated as follows. 








Source of variation Average mean square Degrees of freedom 





Replications k*en*+koz* +o? R-1 


k 
Varieties ignoring blocks Re, ae on’ +o." k?—1 


Blocks A kes? +o," (R —2)(k—1) 
Blocks B (eliminating varieties) tke? +o. 2(k—1) 
Residual o2 (k—1)(Rk—k-—1) 
Total Rk? —-1 





C. Tests of Significance 


The analysis of variance table just presented does not suggest an 
exact test of significance for differences between varietal means. It may 
be, especially in an experiment with a large number of varieties, that 
the experimenter well knows that differences do exist and the signifi- 
cance test is really beside the point. Cochran [3] has described rather 
completely both approximate and exact tests of significance in the 
case of equal sets of replications. The unequal sets case may be treated 
in the same way. 

An approximate test may be made by viewing the experiment as a 
randomized complete block design. That is, we regard the variation due 
to blocks as part of the experimental error. This is given effect by pool- 
ing the blocks and residual sums of squares and using the pooled mean 
square to test the significance of the varieties ignoring blocks mean 
square. 

If we regard the block effects as randomly drawn from a normal 
distribution with mean zero and variance cg’, the distribution of the 
test ratio under the null hypothesis is that of the ratio of two mixtures 
of chi-squares, each with the same average value. Thus the null dis- 
tribution is not quite the same as the F distribution. However, the 
error arising from this source appears to be quite small. 

An exact test can be found from an alternative analysis of variance. 
The foregoing analysis can be further subdivided as follows. The sum 
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of squares for varieties ignoring blocks may be written 


R pe (243. sand z...)? = kR {x (2;.. —_ %...)% + 2X (x.;. — x..." 


ij 


+R =. (x4;. — Ly.. — Lj. + Zz...) 
ij 


or SS (varieties ignoring blocks) 

=SS (varietal main effects ignoring blocks)+-SS (varietal interac- 
tions). (The terminology is borrowed from the pseudo-factorial descrip- 
tion of the analysis [11].) 

The average mean squares and degrees of freedom are as follows. 








Source of variation Average mean square Degrees of freedom 





k 
Varieties ignoring blocks Ro? + kal on?+o.2 k?—1 





Varietal main effects ignoring blocks Ro,?+4kes*+o,? 2(k—1) 
Varietal interactions Ro,*+oe.2 (k—1)* 





Now the SS (varietal main effects ignoring blocks) can be replaced by 
an expression for SS (varietal main effects eliminating blocks). 


SS (varietal main effects eliminating blocks) 


= kRi >> (2.; — Z..)? + kR, p» (Z.; — Z..)*. 


° 


The average mean square is Ro,?+<,? again on 2(k—1) degrees of free- 
dom. Unfortunately this sum of squares is not independent? of blocks 
B. To restore independence we must replace blocks B by 


blocks B, = SS (blocks B ignoring varieties) 
SS (blocks By) = kR: >> (%;. — %..)2 + kR: >, (%.; — ..)?. 
‘ j 


The average mean square is 
4Ro,? + kop? + o. 





? For this reason, the present writer prefers to avoid presenting a single table including both varie- 
ties eliminating blocks and blocks eliminating varieties as in [10]. The degrees of freedom in such a table 
add up correctly, but the sums of squares do not. 
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on 2(k—1) degrees of freedom. An exact test of significance for varietal 
effects is now provided by the pooled mean square for varieties elimi- 
nating blocks (on k?—1 degrees of freedom) against the residual. We 
give below the average mean squares corresponding to this analysis of 
variance table. 








Source of variation Average mean square Degrees of freedom 





Replications k*en?+kop*+o,.? R-1 
Varietal main effects 

(eliminating blocks) 
Varietal interactions Ro.?+e,.2 (k—1)? 
Blocks A kop? +o,? (R —2)(k—-1) 
Blocks B, 4Ro.2+koz* +o? 2(k —1) 
Residual oe? (k —1)(Rk—k—-1) 


4Ro2 +o, 2(k—1) 





D. Estimation of Varietal Means 


In the following we will use the notation v,;’ for the intra-block esti- 
mate of the true varietal mean, v,;, and v,;’’ for the inter-block esti- 
mate. v;;’ is obtained by averaging the appropriate values, one from 
each replication, and adding two corrections to eliminate block effects, 
one for the X-replications and one for the Y-replications. 


v5; = 7;i;. + Cx; + Cy; 


where 


y= Bi) — By - B31. 


An easy calculation shows that block effects are completely eliminated 
from this estimate. However, the elimination of block error necessarily 
introduces additional residual error. If a reasonable estimate of the 
ratio of block variance to residual variance is available, a partial cor- 
rection will give an estimate with smaller variance. Thus, we consider 
an estimate of the form 


vij’ = Lj. + AC: + WCy;, 
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where \ and » are chosen to minimize the variance of v;;’’. These mini- 
mizing values and the method of estimating them are discussed in the 
next two sections. 


E. Variances of Varietal Differences 


The varietal differences will be of three kinds, each having a different 
variance: 


1) both varieties in the same X block 
2) both varieties in the same Y block 
3) varieties not in the same X or Y block. 


The variances of varietal differences for arbitrarily fixed values of 
\ and » can be calculated directly. For example, for varieties in the 
same X block we have 
2c.” R, 2(1 — v)?Reop? 


Y= Var fon!” —- r12""} = Rk (k + R, py?) + R? 





It is easily seen that these variances are all minimized by the same val- 
ues of \ and », namely 
2 2 
- Rekop i Rikop 
Rikos? + Ro? 








- R:kop? + Ro? 


and for these values of \ and »v the three variances reduce to 


1) ¥, « (i = *) 
+ RE Ri 


2 VY. = k ae 
* Rk ( TR, ) 


3) Pike (K+ Sure S *) 
~ RI Rs Ri, ) 


Since \* and y* lie between zero and one, these variances will not differ 
by much if k is large and R,; and R; are nearly equal. In this case, the 
average variance, weighted with the number of comparisons of each 
type, may suffice. This is found to be 


2c,” R, Re 
4 Vie eos oie ie 1). 
) ‘ (b+5 toe ¥ 


~ R(k + 1) 


1 
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In the above calculations the quantities \* and »* are treated as 
constants, whereas in fact they must be estimated from the experi- 
mental data. When k and R are both small, the error involved may be 
large. A preliminary investigation which will ke reported later indicates 
that when k>6, the error will definitely be negligible. 


F. Estimation of \* and »* 


If the ratio of os? to o,2 were known, A* and »* could be determined 
exactly. In the absence of such knowledge, the ratio can be estimated 
from the residual and blocks eliminating varieties sums of squares. The 
optimum method of using this information has not been determined, 
but the method originally given by Yates [12] seems to be satisfactory. 
The blocks A and blocks B sums of squares are pooled and equated 
to the resulting average value. Proceeding similarly with the residual 
we now have two equations in the two unknowns, a,’ and o,”. The solu- 
tions of these two equations are substituted into the formulas for \* 
and y*. If it should happen that the pooled block mean square is actu- 
ally smaller than the residual mean square, we take zero as our estimate 
of \* and »*. The resulting estimates of \* and »* are 


b-—e 
4th 
Re 
b-—e 


My bet. 
é 
Ry 


ifb=e and \X\* = 0ifb <e, 








ifb2e and *=Oifb<e, 


where b=pooled mean square for blocks eliminating varieties, and 
e=residual mean square. For the inter-block analysis the variances of 
varietal differences are estimated as follows. _ 

If be, we substitute the estimates \* and »* in place of \* and »* 
in the formulas on p. 795, and use the residual mean square as an esti- 
mate of o2. If b<e, we replace \* and »* by zero and use the pooled 
mean square for blocks and residual! to estimate ¢,’. 

For the intra-block analysis there is no need to estimate \* and »*. 
We take \=v=1 in the formulas on p. 795 and use the residual mean 
square to estimate o. 

The intra-block estimates of the variances of varietal comparisons 
have the usual distribution of a mean square with the residual degrees 
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of freedom. The distribution of the inter-block estimates is complex, 
but it will be reasonably well approximated by a mean square distribu- 
tion with the same degrees of freedom. 


G. Efficiency of the Unequal Sets Designs 


The usual measure of efficiency for an experiment designed to com- 
pare several similar quantities is the reciprocal of the variance of the 
estimated difference between any two of them. When, as is the case in 
lattice designs, different comparisons do not all have the same variance, 
the usual practice is to use the reciprocal of the average variance of 
all possible comparisons. Since the variance of a comparison in a given 
experiment is proportional to the number of replications used, the 
above quantity divided by the number of replications is a measure of 
the intrinsic efficiency of the design. We need such an intrinsic criterion 
particularly in order to gauge the efficiency of the (2, 1) and (3, 2) 
designs relative to a lattice with equal sets of replications. 

The efficiency of a given design relative to an alternative will be 
measured by the ratio of the above criteria for the two designs in ex- 
periments for which o,* and oc, are the same for both designs. The 
calculation of variances is straightforward and results in the following. 


AVERAGE VARIANCE OF VARIETAL COMPARISONS FOR 
VARIOUS DESIGNS 








Design Average variance of comparisons 





2c, op’ 
, k _ 2a8 oR 
Randomized Blocks R&+1 [e+s = +1] 





20,” kop? 
Lattice with Equal [ ¥ ] 
attice with Equal Sets R&+) Lt Fkos*tee 





r 20." kop? 
Triple Latti 4 
riple Lattice R&+1) [ete +1] 


on’ +o, 


2c, . kop? 
R(k+1) [ teat] 





Quintuple Lattice 





The triple and quintuple lattices are included for comparison with 
the (2, 1) and 3, 2) designs. Comparing the average variance of com- 
parisons in the unequal sets lattice with the variances listed we find the 
following relative efficiencies. 
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EFFICIENCY OF UNEQUAL SETS LATTICE RELATIVE TO 
ALTERNATIVE DESIGNS 





—— 
— 





Relative efficiency* of (Ri, R:) 
simple lattice 
kop? 


k+——-+1 
Randomized Blocks ‘ € z 
k te r* eee Lg 1 
TR, TR, : . 


Alternative design 








kop? 


ag oe OT ke 





Lattice with Equal Sets 
k +h * +m v*+1 
R, R, 
kop? 


— jkoz* +o? 


+1 
Triple Lattice 





R, R: 
ete r* ie ~ 
hte TR, v*+1 


kop? 
k-+———_____. 
+ tho? +e,’ 
R, 


R; 
+— A*+— * 4. 
: R, . R, ate 


+1 
Quintuple Lattice 








* These relative efficiencies are calculated without allowance for the inaccuracy of weighting. This 
discriminates against the randomized block design which is, in fact, somewhat more efficient than the 
alternatives (rather than slightly less) when ¢g? is very small. 

. 


The relative efficiencies using the intra-block analysis may be ob- 
tained by replacing \* and »* by one and, in the case of the alternative 
lattices, taking the limit as o,?/o.2 becomes large. For the randomized 
block design there is no intra-block analysis and the efficiency remains 
a function of o,?/¢2. 

In the comparisons with other lattice designs it will be noted that 
the relative efficiencies approach one when k is large, or when oz? 
approaches zero and the inter-block analysis is used. It can also be 
verified by straightforward algebra that the least favorable values for 
the unequal sets designs occur when c;? is large, or when the intra- 
block analysis is used. Since k= 5 is frequently taken as the lower limit 
for reasonable accuracy in the recovery of inter-block information, e.g. 
[2], we will examine this case in detail. 

We note first that our unequal sets design is the least efficient of the 
lattices compared. This is only to be expected since it is the furthest 
from balance. However, compared with the equal sets design it does 
not do at all badly. In the worst case, that of large os*/o., or the intra- 





Ar LYSIS OF SIMPLE LATTICE DESIGNS 799 
block analysis, the efficiency relative to the equal sets design becomes 
k+3 





Rk, Rk: 
if R, * R, Fy 
Thus, for a 5X5 lattice the (2, 1) design is at worst 94 per cent efficient 
and the (3, 2) design 98 per cent efficient. These efficiencies will im- 
prove with larger *, as noted earlier. Hence, to a fair approximation, 
the efficiency of unequal sets designs relative to alternatives is about 
the same as that of the equal sets designs, provided the inequality in 
the number of X- and Y-replications is not excessive. 

The advantage of the unequal sets designs lies in the simplicity of 
the calculations required, these being no more arduous than for the 
lattice with equal sets. In the triple and quintuple lattices, on the 
other hand, the additional labor involved in finding the adjusted means 
may be considerable. In cireumstances where the cost of analysis is an 
appreciable portion of the cost of experimentation, the unequal sets 
designs may be regarded as competitors of the more balanced, but 
more complex, triple and quintuple lattice designs. The least favorable 
situation for the unequal sets designs relative to the triple and quin- 
tuple lattices is again found to be the intra-block analysis with k small. 
We see directly that using the intra-block analysis with k=5, the (2, 1) 
design is 88 per cent efficient relative to the triple lattice and the 
(3, 2) design is 89 per cent efficient relative to the quintuple lattice. 

If it is known in advance that the block variability is not excessive, 
we may be assured of somewhat greater efficiency. The usual measure 
of block variability is the ratio of the variance between blocks to that 
within blocks which is 


kop? + 0? 


2 


i 
ur 


(or w/w’ in the notation of [2] and [12]). We see that the efficiencies 
can be expressed as functions of y. In the above two cases the maximum 
loss in efficiency will be reduced by at least half if 7 is no greater than 
five. 


II. NUMERICAL EXAMPLE® 


The material used in the example is drawn from two separate ex- 





3 The writer is indebted for the experimental data to Dr. P. H. Harvey, Agronomist, Bureau of 
Piant Industry, U. 8. Dept. of Agronomy, North Carolina State College, Raleigh, N. C. The material 
was transmitted to the writer through the courtesy of Dr. R. J. Monroe, North Carolina State College, 
Raleigh, N. C. 
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periments at different locations. Both experiments used the same varie- 
ties and design, each with two X- and two Y-replications. To construct 
an example of a (3, 2) lattice, an X-replication from the second experi- 
ment was added to the first experiment, giving three X- and two Y- 
replications. Any interpretation of our numerical results must bear in 
mind the artificial nature of the “experiment” analyzed. The crop in 
question was corn and the variable was field weight of grain in pounds, 
Each plot consisted of 30 (=2X15) single plant hills. 

Table I presents the basic data with the totals for each block. In 
Table II we have the X-replications combined and the Y-replications 
combined, with row and column totals in each case. Table III gives the 
grand total for each variety with the row and column totals. 


TABLE I 
SINGLE PLOT YIELDS 








X-REPLICATIONS 





16. 16. 16.3 
14. 12. 15. 
10. 13. 15. 
11. 14. 15. 
13. 15. 13. 
17. 17. 13. 


mano wooo 


14. 12. 15. 
11. 13. 16. 
13. 18. 14. 
12. 14. 14. 
15. 14. 11. 
18. 16. 12. 


Ch ROAM 


79332 .8 


18. 
12. 
18. 
18.0 
15.7 
16.3 
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Y-REPLICATIONS 





19.6 
19.3 
16.5 
13.2 
16.2 
16.5 
107923 .5 


100730 .4 





TABLE II 
COMBINATION OF REPLICATIONS 








X=Xi+X2+Xs Totals 





278.6 
274.2 
289 .9 
261.6 
246.8 
287.1 


282. ‘ 263.5 1638.2 365.9 


Y=Y,+Y; RR,kCy; 
31.9 30. ‘ ‘ ‘ 36.9 200.5 — 67.9 


32.7 34. 35.2 206 .8 —55.6 
33.9 45. . ‘ ‘ 34.5 220.3 —78.9 
34.5 41. : : ; 36.7 203 .8 —99.8 
33.6 29. 34.0 191.3 —16.5 
37.7 34. 29.2 191.4 —47.2 


204.3 216.4 196.9 202.5 187.5 206.5 1214.1 —365.9 
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A. The Analysis of Variance 


The analysis proceeds in the usual way as follows. The correction 
term is given by 





(1444)? Be (2852.3)? 
Rk? s«éd1880 


C= = 45197.86. 


The total sum of squares is obtained by summing the squares of each 
plot yield and subtracting the correction term. 


Total SS = >> (xij,)? — C = (14.2)? + (16.0)? + +--+ + (14.1)? -C 


tjr 


= 46258.27 — 45197.86 = 1060.41. 


TABLE III 
VARIETY TOTAL YIELDS 








482.9 
490 .6 
486.8 
464.1 
434.3 
493 .6 


467.3 489.2 511.3 459 .6 470.0 454.9 2852.3 





Similarly, 


1 
Replications SS = rr > (t44r)? —C 


1 
“- [(544.4)? + (530.0)? + (563.8)? + (616.9)? 


+ (597.2)?] —C 
= 45343.20 — 45197.86 = 145.34, 


1 
Varieties ignoring blocks SS = ? Dd (xii)? — C 
ij 


= - [(73.5)? + - ++ + (68.1)?] —C 


= 45667.87 — 45197.86 = 470.01. 
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The blocks A sum of squares is computed in two parts, one from the 
X-replications wir A’) and one from the Y-replications (blocks A’’). 


Blocks A’ SS = — » » > (24,)* - — aD (244)? 


k t r=] 
2 1 Ri 2 
~ Rik 72> (e rr) ’ aa x rs41) 


Rik 


1 
= = [(94.8)? + + + (98.2)*] 
— = [(644.4)* + (680.0)* + (663.8) 
1 
— 5, le78.6)* +--+ + (287.1)?] 


+ - [1638.2]? 
108 


= 13.77 
X-REPLICATION BLOCK TOTALS 











Total 544.4 





Blocks A” ss => . on. 2 a 


- = 2( > Lie . : ( > te) 


R2k r= Ry-+1 Rok? \ rR 41 


= - [(109.1)? + - - - + (98.5)?] 


~ + (616.9) + (597.2)"] 
36 
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1 
me [(200.5)* + - - - + (191.4)?] 


+ : [1214.1]? 
72 


= 58.20 
Y-REPLICATION BLOCK TOTALS 








Total 





109.1 91.4 200.5 
109.8 97.0 206 .8 
108.8 111.5 220.3 
106.8 97.0 203 .8 
89.5 101.8 191.3 
92.9 98.5 191.4 


Total 616.9 597 .2 1214.1 





In the case of blocks A’’ we have only two replications which permits 
a simplification of the computation. 


1 1 
Blocks A” SS = — > » (aja — Tzs)? — (Ti+ — 45)? 
j 


Rak R.k? 


1 
= Soeq [TD + 12.8)* +--+ + 6.6))] 


_ 19.7)? 
2 X 36 — 
= 58.20 


Y-REPLICATION BLOCK TOTALS 











109.1 
109.8 
108.8 
106.8 
89.5 
92.9 


Total 616.9 
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The blocks B sum of squares is also computed in two parts. It will 
be noted that the quantities to be squared are proportional to the un- 
adjusted correction terms for blocks, Cx; and Cy;. In this computation 
the terms in ..—%,, may be ignored as the calculation corrects for 
means automatically. Thus in place of Cx; and Cy; it is more con- 
venient to use 


= Ri(Red.4) — Ro(Aiki+) 
R,(ith column total for group Y) 
— R,(ith row total for group X) 
and similarly, 
RRikCy,’ = R2(jth row total for group X) 
— R,(jth column total for group Y). 
For example: 
RR2kC x,’ = 3(204.3) — 2(278.6) = 55.7. 


In this manner, we obtain: 








RRkCx,’ RR,kCy;’ 





— 67.9 
—55.6 
—78.9 
—99.8 
—16.5 
—47.2 


365.9 — 365.9 





As a computational check, it may be noted that the sum of the 
RRkC x,’ is just RR:R, (sum of all X plots—sum of all Y plots). The 
sum of the RR,kCy;’ is the negative of this. 

We may now write the sum of squares for blocks B. 
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1 1 
Blocks B’ SS = RRkCx;')? — RR:kCx.,')? 
cks RRRA 2 (RRkCx,’) RRR okC'x,.’) 


[(55.7)? + (100.8)? +--+ + (45.3)'] 





5X3xX2x6 





— Bxaxa xe: ooo" 


= 27.63. 


1 1 
Blocks B” SS = S* (RR,kCy,’)? — RRikCy..)? 
, RR,Rok 2. ( wer; ) RRR.A mrs) 





[(67.9)? + (55.6)? + - ++ + (47.2) 


~ BX3X2X6 


(365.9)? 





~ 5BX3X2X6 
= 22.63. 


The residual sum of squares may now be obtained by difference: 


Residual SS = 1060.41 — (145.34+470.01+13.77+58.20+27.63+22.63) 
= 322.83. 


The above calculations give the analysis of variance table as follows. 








Degrees of Sum of Mean 


Source of Variation Seendom Squares Square 





Neplications 4 145.34 36.335 
Varieties ignoring blocks 35 470.01 13.429 
Blocks A 15 71.97 4.798 
Blocks B 10 50.26 5.026 
Residual 115 322.83 2.807 


Total 179 1060.41 





B. Tests of Significance 


A simple approximate F-test for varietal effects is made by com- 
bining the blocks and residual sums of squares, giving an error term 
of 


71.97 + 50.26 + 322.83 = 445.06 on 140 df. 
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or a mean square of 3.179. This is to be compared with varieties ignor- 
ing blocks, giving 


13.429 
F = = 4,224 on 35 and 140 df. 


3.179 
which is highly significant. The test is not exact because the error term 
is a mixture of mean squares. The error in significance level caused by 
using this approximateion has been investigated by Cochran [1]. In 
a case such as this the error will be quite small. 

If an exact F-test is required, we must perform a little extra compu- 
tation. We require the sum of squares for varieties eliminating blocks, 
which is most easily found from the identity 

SS (varieties eliminating blocks) 
+SS (blocks ignoring varieties) 
SS (varieties ignoring blocks) 
+SS (blocks eliminating varieties) { 


Now SS (blocks ignoring varieties) 


-|— oe (er) 


¢ r=] k? r=] 


+[SE E ewt-G LD ea] 


j r=R,41 k? r=R,+1 


= 4 [(94.8)? + - -- + (98.2)2] — - [ (544.4)? + (530.0)? + (563.8)?] 


+ : (109.1)? ++ +--+ + (98.5)] — Pe [(616.9)? + (597.2)?] 


= 195.19. 
Hence SS (varieties eliminating blocks) 
= 470.01 + 122.23 — 195.19 = 397.05 on 35 df. 


and the corresponding mean square is 11.344. This is to be tested 
against the residual mean square, giving 


11.344 
F = = 4.012 on 35 and 115 df., 
.807 


which is also highly significant. 
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C. Estimation of Varietal Means 


If the intra-block analysis is to be used, the estimation of varietal 
means is now trivial. We need merely adjust the raw varietal means by 
adding the quantities Cx; and Cy;, i.e. 


vi; = ry. + Cxi + Cy;. 


The use of Cx,’ and Cy,’ in place of Cx; and Cy; introduces a constant 
bias which does not affect the comparisons and is generally quite 
small. The bias is readily removed by adding the quantity —1/k(Cx,’ 
+Cy.’) which in our case is equal to —0.34. 

To calculate the inter-block estimates of the varietal means, we 
must first estimate the coefficients \* and »*. The formulas for these 
estimates appear on p. 796 and reduce in this example to 


” b-—e 

h* = 
b+e 
b—e 


b+— 
—e 
3 


where b= pooled mean square for blocks eliminating varieties and e= 
residual mean square. In this example 


oe 71.97 + 50.26 
- 25 





= 4.889 


and e=2.807 so that 


X* = 0.2705 and 5* = 0.3574. 


Multiples of the unadjusted correction terms have already been 
calculated, namely, RR.kCx;’ and RR,kCy;’. The adjusted correction 
terms are found by multiplying these quantities by 

X* 0.2705 * 0.3574 


= 0.004503 and = 
RRk 60 RRik 90 


respectively. (Replace \* and * by one for the intra-block estimates). 
These corrections are appended to the table of varieta! means (Table 
IV) and the appropriate terms added to each raw varietal mean yield- 
ing the corrected means (Table V). 

The additional adjustment due to using Cx,’ and Cy,’ in place of 
Cx; and Cy; is 








= 0.003971, 
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1 ~ 
~~ (A*C' x,’ + *Cy,’) = — 0.03, 


which may be added to each varietal mean, if desired. 

It may be of interest to point out that the need for these corrections 
is peculiar to the unequal sets designs. The corrections are identically 
zero in an equal sets design. 


TABLE IV 
VARIETY MEANS (UNADJUSTED) 











14.70 52 16.38 16.00 16.18 16.80 
15.48 15 19.76 16.12 13.48 16.10 
15.52 19.70 17.02 13.76 16.48 14.86 
16.54 15.84 16.22 12.38 15.96 15.88 
14.04 12.78 16.00 15.38 14.94 13.72 
17.18 15.82 16.86 18.28 16.96 13.62 


Cy; —0.270 —0.221 —-0.313 —0.396 —0.066 —0.187 





TABLE V 
VARIETY MEANS (ADJUSTED) 








14.681 16.550 16.318 15.855 16.365 
15.664 17 .383 19.901 16.178 13 .868 
15.299 19.528 16.756 13.413 16.463 
16.650 15.999 16.287 12.364 16.274 
14.081 12.870 15.998 15.295 15.185 
17.114 15.803 16.751 18.088 17.098 





D. Variances and Standard Errors of Varietal Differences 


The variances of varietal differences depend on whether or not the 
varieties being compared appear together in some block. We give first 
the variances and standard errors for the intra-block analysis, using 
the formulas on p. 795 with A* and »* replaced by one. 


Intra-block Variances and Standard Errors 
a) Varieties in same X Block 


Vy = Var fon! aaa v2’ } = 


2X eal 2 


6 +2 = 1.248 
5 X6 3 


S.E. fou’ — v’} = 1.248 = 1.117 
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b) Varieties in same Y Block 


a 2 X 2.807 3 
V.’ = Var { on! 7 va} = | 


6 + — | = 1.404 
sxe |°* 3) 


2 
S.E. {on’ — on’} = 1.404 = 1.185 
c) Varieties not in same Block 


aot mes iia 
5 X6 . 


V;’ = Var { on! —_ V22" } = 
3 
S.E. {on’ — vm} = +/1.528 = 1.236 


d) Average of all comparisons 


ie ES iain 
5X7 3° 2 sala 


Av. S.E. = 71.470 = 1.212. 


For most purposes the average standard error, 1.212, would be ade- 


| quate. 
The variances appropriate to the inter-block analysis are also de- 


rived from the formulas on p. 795, using the estimates \* and >* of \* 
and »*. 


V,’ = Av. Var 


Inter-block Variances and Standard Errors 
a) Varieties in same X Block 
2 X 2.807 


Vi" = Var fon!” re v19"" - ~- ; 


2 
[6 o (0.3574) | = 1.167 


S.E. {oy — vy” 
b) Varieties in same Y Block 
2 X 2.807 
5X6. 
S.E. fon!’ — vn’’} = 1.199 = 1.095 


3 
6 + — (0.2705) | = 1.199 
| += | 


V," = Var {on je Vo" 


c) Varteties not in same Block 


Pa" = Var fou" — on} = -~—"[6 + — (o.aszs 
5X6 3 
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3 
ry (0.2705) | = 1.243 


S.E. {ou — vn!’} = 1.243 = 1.115 
d) Average of all comparisons 


V Av. V mi ORR [6 +. : (0.3574) + . (0.2705) + | 
i= _- ar naan Rs tienen — ua — ’ 
. 5x7 3 2 


= 1.226 
Av. S.E. = 71.226 = 1.107. 


Again, for most purposes, the average standard error would be ade- 
quate. We see that the estimated average variance in the inter-block 
analysis is 17 per cent less than the estimated average variance in the 
intra-block analysis. 

Due to the fact that \* and >* are estimates rather than true values 
the above variance estimates tend to be a little too small, i.e. they have 
a negative bias. An approximate adjustment made to remove this bias 
changes the estimated standard errors in this case by less than one per 
cent [9]. 


E. Efficiency Relative to Alternative Designs 


The formulas used to gauge relative efficiency are given on p. 798. 
In using them we will estimate co? by the residual mean square, 2.807, 
and keg? by the quantity (b—e)R/(R—1) = (4.889 —2.807)5/4=2.603. 

The estimated efficiency of this particular experiment using intra- 
block analysis relative to randomized blocks is 

1.272 


—— = 0.865 or 86.5%, 
1.470 7o 


a loss of about 13 per cent. If the inter-block analysis is used, the rela- 
tive efficiency becomes 
1.272 


een AGN 103.8%, 
1.226 - 7% 


a gain of not quite 4 per cent. 

The efficiency relative to an equal sets design is very close to 100 per 
cent by either method of analysis. Using the intra-block analysis the 
efficiency is 
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6+3 


ce—4— 41 
. a 


= 0.982 or 98.2%. 





Using the inter-block analysis we calculate an efficiency of 99.86 per 
cent. 

The quintuple lattice does not exist for a 6X6 design, so there.is no 
point in making that comparison for a lattice experiment with 36 
varieties. 


F. Conclusions on the Numerical Example 


If the above findings were made relative to an actual experiment 
instead of our synthetic one, one might draw the following conclusions. 

a) The lattice design did not appreciably improve the accuracy of 
the experiment relative to what might have been expected from a 
randomized blocks design. 

b) The use of the inter-block analysis has saved the experiment from 
a considerable loss (13 per cent) relative to a randomized blocks design. 

c) The loss of efficiency due to using unequal sets of replications 
was negligible. 


[Note: In the two original experiments from which our data were taken, 
the apparent gains relative to randomized blocks (using inter-block 
analysis) were 11 per cent in the first experiment (from which we took 
four replications) and 3 per cent in the second (from which we took one 
replication).] 
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ON THE PRESENTATION OF THE RESULTS OF SAMPLE 
SURVEYS AS LEGAL EVIDENCE* 


W. Epwarps DEMING 
New York Universtiy 


. PURPOSE OF THIS PAPER 


HE purpose here is to view some of the problems that confront the 
ae or when he presents the results of a sample survey as legal 
evidence. One particular point is that the statistician, if he is to make his 
work useful, must distinguish between (a) what he as a statistician may 
say about the precision of the results of his survey, and (b) what an expert 
in the substantive field may conclude about the usefulness of the results. 
The statistician can testify only to the former, and possibly also about 
the variance between investigators, and between different methods, if he 
measured these differences. As a secondary purpose, we shall enquire into 
the meaning of a standard error, and its relation to a complete count and 
to the usefulness of the results—a point that is often overlooked, not 
only in testimony but in statistical reports. 

I have no magic nor all the answers to all the questions and difficulties 
that the statistician will encounter when he presents results as evidence. 
It is possible, however, to share some experiences with colleagues in this 
increasingly important role of statistical surveys; to acquaint them with 
some of the kinds of problems that may arise; and to suggest some general 
principles that will help the statistician to make his work more useful 
than it would be otherwise. 

At the outset I may explain that this paper will deal only with prob- 
ability samples. The defence of any other kind of sample is hardly a 
problem for a statistician anyhow, but rather for the substantive expert 
who may have enough knowledge of the material and of its variability 
to feel that he can testify one way or another with respect to the interpre- 
tation of the results of a judgment sample. 

A statistical survey, formulated and carried out by the dictates of the 
theory of probability, is to the statistician an exciting and remarkable 
achievement. It produces man’s best empirical knowledge, and it provides 
an objective measure of the amount of knowledge in the survey. The pre- 
cision desired can be aimed at and hit pretty accurately by planning 
in advance with the aid of the theory of probability and with bits of knowl- 
edge with respect to certain proportions, means, correlations, variances, 
and other statistical measures of the sampling units in the frame. Then, 
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after the survey is completed, the precision that was actually reached is 
calculable and expressible in an international standard of measure (the 
standard error) from the results of the survey itself. This final measure 
of the precision is objective, and is not a matter of opinion. It is not biased 
by incorrect assumptions that went into the planning. 

The statistician when he presents his results as legal evidence finds 
himself nevertheless at an uncomfortable disadvantage. He is usually 
talking to scholars, but not to fellow statisticians, nor indeed to other 
scientists, nor relating the results of his research to a trusting client or 
sponsor. He is teaching, but the techniques of the class room will not 
necessarily be the best ones for the presentation of evidence. 

Scholars in other disciplines are not all acquainted with the achieve- 
ments of probability sampling, yet the statistician must somehow explain 
his methods to them. Some of the people that he must deal with in legal 
evidence know sampling only as a failure to predict an election; they 
know not the distinction between (a) the standard error of sampling, (b) 
the errors common to complete counts and to samples, and (c) the error 
of a prediction. Other people think of sampling as a selection by judg- 
ment, carried out by someone who has established a reputation by a run 
of successes in the past. To still others, sampling is a desperate risk, a 
hazardous aimless random drawing of areas or of other elements to which 
anything may happen, and concerning which nothing can really ever be 
known except by comparison with a complete count, for which a sample is 
only a substitute to save time and money. 

In my own experience, a man questioned the existence of the theory 
for estimating the variance of a mean, originated by Gauss 120 years 
ago, and now used all over the world. I was once accused of “pyramiding”’ 
my results (whatever that is), because I took the average of the averages 
of my 10 subsamples for an estimate of the whole, wherefore my standard 
error “must be viewed with some doubt.” 


DIRECT TESTIMONY: CROSS-EXAMINATION 


There is first of all direct testimony, wherein the statistician presents 
his results, after careful preparation in advance. He will usuall:’ simply 
read his direct testimony into the record from typed copy. Direct testi- 
mony may take the form of questions and answers, the questions being 
read by the lawyer who has engaged the statistician. The questions should 
be framed so that they display the results of the survey in the form of 
valid statistical inferences. The questions must not sound as if they were 
digging for particular answers, even though both sides in the case know 
full well that the statistician is reading from prepared copy, and that the 
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answers are exactly what the statistician believes to be essential to his 
methods and to his results, regardless of the questions. 

In the preparation of testimony, the lawyer who engaged the statis- 
tician will not try to influence the content of the statistician’s statements, 
He will try to help the statistician to state his procedures and his infer- 
ences so that they will be clear. The inferences must be only what the 
statistician can support, as a scientist seeking truth. To bring out the 
truth in a scientific inference, one must not only state what he believes 
to be true, but he must say it so that his listeners will understand what 
he means, and not think that he has said something that he did not mean. 
A good lawyer can help immeasurably in achieving this aim. 

In giving evidence, the statistician is not fighting a case for either side. 
He is an expert witness, and he should appear as a professional man, 
with the sole aim of presenting the truth. This means that he must tell to 
the best of his ability what the figures mean. He should describe in full 
any difficulties that he encountered, and their possible limitations on the 
interpretations of his data. 

In some courts one may not read prepared testimony, in which case 
one can only prepare to present his testimony without the aid of his typed 
copy. He will of course still be able to present tables and charts, called 
exhibits. 

Usually during or immediately following direct testimony the opposing 
side asks only questions that will clear up simple failure to recognize 
technical terms, or to clarify some events with respect to their sequence 
in time. Questions that may bring out flaws in the testimony, they will 
usually reserve for further study, following which they will call the statis- 
tician to the stand for cross-examination. Here the questions are often 
well-prepared in advance, but the statistician must answer ex tempore. 
Here the statistician may find himself very uncomfortable to find that 
statements and interpretations that he thought were clear and objective 
are now misunderstood and misinterpreted, and his statistical principles 
challenged. 

When cross-examination comes, no matter what question comes, rele- 
vant or irrelevant, do the best that you can with it. Be cautious to stay 
within the field of competence that you have testified to in your qualifica- 
tions (vide infra). Groundwork in your direct testimony, in an attempt 
to give clear explanations of your procedures, of the statistical interpreta- 
tion of your results and of their standard errors, will help to keep the cross- 
examination on the track and to bring out the inherent scientific truth 
contained in your survey. 
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To present the results of a survey in a case where millions of dollars 
are involved, to ears unfamiliar with the power of modern statistical 
practice, is an experience that purifies the statistician’s thinking. Some- 
times the listeners are glad to accept the results of a good survey, and 
to learn something about modern survey-methods. At other times, they 
will declare that the statistician’s methods are new and untried, that his 
results are therefore not acceptable evidence; that his sample was too 
small; or finally, foresooth, that he has not explained the entire theory 
of sampling so that everyone can understand exactly what he did and 
why, and that there is therefore no basis by which to judge whether his 
results have any meaning. 


ESSENTIAL INGREDIENTS OF THE DIRECT TESTIMONY 


The statistician’s statement of his qualifications, which usually comes 
in the first part of his direct testimony, is important. It is evidence by 
which the examiner or judge may decide, if the question arises, whether 
the statistician is qualified. It should therefore contain a full account of 
the statistician’s education and relevant experience. 

He may then present the purpose of the survey (an example of an as- 
signment will occur later), what he endeavored to do, the methods that 
he prescribed, the basis for these methods, the system and the observa- 
tions by which he satisfied himself that the procedures that he prescribed 
were understood and followed rigidly and faithfully; finally, the results 
and their standard errors and their interpretation; also the possible effects 
of any biases inherent in the procedure, and the possible effects of any 
difficulties encountered. All these points will go into the direct testimony. 

He should tell in simple words what the procedures actually were. He 
should limit theory to a few simple and well-established principles that 
illustrate the sampling procedures and the interpretation of the results. 
The truth and the whole truth means clarity, so that anyone may judge 
whether your results and your interpretation of the standard error are 
what you say they are. You can not hope to give a whole course in the 
theory of sampling, but you can make your procedures and their validity 
clear without doing so. The most convincing argument concerning your 
procedures and of your interpretations is that they conform to estab- 
lished international standards, and that they are used in a wide variety 
of experience. In this connection the document written by the United 
Nations Sub-Commission on Statistical Sampling entitled, “The presen- 
tation of sampling survey results” (UN Series C, No. 1, 1950) is of assist- 
ance; likewise the “Manual on the Quality Control of Materials” (1951) 
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and other recommended and standard practices of the American Society 
for Testing Materials, many of which have been adopted as standards in 
other parts of the world. 

A formula can cause trouble unless you explain pretty expertly how 
you used it. If in direct testimony you say that you used a formula to 
calculate in advance the size of sample required, when the fact is that 
you made a rough mental calculation and tempered it with judgment, 
or that you made the calculation years ago for similar work, and really 
did not make a fresh detailed calculation for this job, or if you did make 
one and then modified the answer to allow for some possible additional 
variance not fully represented in the formula, or to allow for some possible 
heavy additional cost of inspection or of interviewing because of probable 
bad weather in February, and if now upon cross-examination when people 
start asking questions you can not get exactly the same sample-size out 
of your formula as you actually used, you may find yourself very un- 
comfortable. The trouble is that people not accustomed to formulas will 
not understand how one uses theory. 

If you say that a certain constant in your formula for the required 
sample-size represents your advance estimate of the variability of the 
material that you sampled, someone may accuse you of prejudging the 
answer. The fact is, however, that this advance estimate does not invali- 
date in the slightest the standard error calculated from the results, nor 
cause any bias in the procedure. You must make this clear in your direct 
testimony. 

In practice sample-sizes are based on both theory and experience, even 
though you do not make a fresh calculation for every sample-design. 
Theory is part of your experience. Without theory, experience has no 
meaning. Theory and experience together produce scientific advances. 
All this can be made clear, I believe. 

I proceed now to describe some of the other problems of exposition that 
have arisen, and to offer some suggestions toward meeting them. 


IMPLICIT FAITH IN THE COMPLETE COVERAGE, 
AND IN THE 10 PER CENT SAMPLE 


A complete coverage, no matter how carried out, and even though it is 
incomplete (as complete counts too often are), has weight in evidence. 
A sample, unless it is a 10 per cent sample, has two strikes against it to 
start with. People who are not statisticians assume that the sheer size of 
a complete coverage will somehow cover up its incompleteness and the 
flaws in the method of measurement or in the interviewing. They believe 
that a judgment sample, if it is big enough, will do the same; and that it 
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will in addition overcome biases of the unknown probabilities of selection. 

A 10 per cent sample has almost equal standing with a complete count 
—maybe even better than a 15 per cent sample. Why, or what 10 per 
cent, is hardly ever questioned, even by experts in quantitative subject- 
matter. 

The statistician, in the explanation of his sampling procedure, faces 
such preconceived ideas. The precision of a small sample, selected and 
estimated by an efficient probability procedure, will require justifica- 
tion. It is a fact that the aerial plant in a sample of 1000 to 1500 tele- 
phone poles will provide all the precision that one can use for the estima- 
tion of the average over-all physical condition of the entire aerial plant 
which might be worth $200,000,000. But without very careful prepara- 
tion to dispel preconceived ideas about complete counts and 10 per cent 
samples, the statistician must be prepared to face an objection on the 
ground that a sample of only 1 part in 1000 is not admissible as evidence. 
The man who objects may, without knowing it, own stock in a woolen 
mill that purchases a million pounds of wool and pays duty on it on the 
basis of a sample that weighs from 60 to 100 ounces. 

The troubles that people have in understanding the power of a small 
sample are often tied up with failure to understand that it is the absolute 
size (n) of the sample, and not its proportion (n/N) to the whole, which 
determines the standard error of the result. The statistician must be pre- 
pared to meet the man who thinks that to reach a prescribed precision 
in an estimated average rent, for example, a sample of dwelling units 
from a big city must be bigger than the sample from a small city, because 
the big city is bigger. 

With careful preparation, you can dispel such misunderstandings in 
an entertaining way, and in simple language. You can explain with black 
and white beans the statistical principles used, and why it is that the 
standard error of a sample is in practice hardly influenced at all by the 
size of the lot that it was drawn from. You can portray vividly how a 
pint jar of dried beans scooped up from a larger mixture of black and 
white beans will provide an estimate of the proportion black in the mix- 
ture; and that a sample of less than a pint would probably be sufficient. 
You may then observe, and your listeners will agree, that the mixture 
could as well be a carload of beans as a bushel of beans: the sample pro- 
vides as good an estimate of the proportion black for the carload as it 
does for the bushel, provided that in both cases the mixture is thoroughly 
mixed (an illustration borrowed from testimony presented by Professor 
John W. Tukey). In practice we accomplish thorough mixing with the 
use of a table of random numbers—a tool indispensable today in science. 
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The nigh total failure of the size of a lot to have any influence on the 
standard error of a random sample drawn thereform is illustrated by 
charts in Eugene L. Grant’s book, Statistical Quality Control (McGraw- 
Hill, 1946), page 345. Incidentally, such citations will often help the 
statistician’s listeners to appreciate the fact that his methods are in 
universal use. One may usefully refer to the ever-expanding depend- 
ence of all kinds of scientific, industrial, agricultural, and medical research 
on statistical theory; the use of statistical methods to attain extreme 
precision in industrial production; the necessity for proper statistical 
design in the comparison of two industrial processes, machines, or medical 
treatments, the growing reliance, in many parts of the world, on probabil- 
ity samples in social and economic studies that are to guide important 
decisions. 

If you succeed in making your explanation clear, you will help your 
listeners to appreciate the contribution of modern statistical principles 
and techniques to scientific truth. They may be grateful, in the long run. 

Complex terms, flourished too freely, may alienate your listeners. Rely 
on patience, truth, and simple language. You can not afford to lose the 
attention of the examiner or judge; he is in position to protect truth and 
accuracy of statement. In cross-examination keep him on your side by 
your fairness and willingness to try to clear up any questions concerned 
with your sample. 


PRECISION, ACCURACY, AND STANDARD ERROR 


Two concepts that are important to make clear in any presentation are 
precision and accuracy. Most statisticians probably think that they know 
what these words mean. I must confess that experience under the fire of 
cross-examination taught me some new angles to their meaning, and 
taught me the importance of explaining in advance the limitations of a 
standard error. 

Precision is expressible by an international standard, viz., the standard 
error. It measures the average of the differences between a complete 
coverage and a long series of estimates formed from samples drawn from 
this complete coverage by a particular procedure of drawing, and proc- 
essed by a particular estimating formula. 

Great precision or a small standard error attached to an estimate does 
not mean that this estimate is necessarily highly accurate or useful. It 
does mean that the results of a complete coverage would have been the 
same within a very narrow margin of difference, had the complete cover- 
age been carried out with the same investigators, sharing the load pro- 
portionately, and with the same care as they expended on the samples. 
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The so-called “expected value” of a sampling procedure (which of 
course includes the formula for the estimate) is the same as the result 
of an attempted complete coverage of the same frame that the samples 
are to be drawn from (except for a possible bias in the formula, for 
which an upper and innocuous limit will be known). Both the complete 
coverage and the sample are subject to the same uncertainties and 
errors, such as inadequate supervision, nonresponse, wrong informa- 
tion, missing information, failure of workers to cover their whole assign- 
ments, and to find all the people or all the items. The only difference is 
that the sample has sampling error, which is the one error that we are 
best able to govern and to measure. The statistician measures the un- 
certainty introduced by sampling. The substantive expert judges 
whether the same operations would give accurate and useful informa- 
tion if applied to the entire frame. 

The statistician will have drawn up the statistical procedures for the 
survey (the design of the sample, the instructions for drawing it, the 
instructions for tabulating the results and for computing the estimates 
and their standard errors). During the progress of the work, he should 
be on hand as often and as long as necessary to know that the company 
that retained him is following his instructions meticulously. He is then 
in a position to defend the validity of the standard error. If at any time 
he is not satisfied with the performance of the workers, it is better for 
him to terminate at once his relationship with the client. He should be 
sure that this responsibility is clear beforehand. 

A statistician will occasionally be called upon to give his opinion in 
regard to procedures that another statistician has drawn up and testified 
to, or to give his interpretation of the results, including the standard 
error. After he has a chance to examine the procedures, he may testify, 
if he agrees, that they are one of many possible probability designs, and 
that IF they were followed meticulously, the results and the standard 
errors have certain interpretations, which he may give if called upon to 
do so. He may require, before he testifies, that certain calculations be 
carried out, to help him to examine the magnitudes of any biases that 
he may suspect. He may require calculations of skewness, if he suspects 
that the estimate of the standard error is not sufficiently firm. The results 
of these investigations will guide his conclusions and his testimony con- 
cerning the precision of the results of the survey. He must not be satisfied 
to testify to what he knows; he must explain how certain aspects of the 
survey that he had no opportunity to examine could possibly affect the 
results. 

Even with familiarity with the job, and no matter how satisfied the 
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statistician may be with the execution thereof, he can still not testify to 
the inherent usefulness of the result. Unfortunately, he has no standard 
error of the usefulness of a result. Testimony on the usefulness of the re- 
sults will be left to the substantive expert—the engineer, the chemist, the 
physician, the population expert, the agricultural expert. The usefulness 
of a result is not a problem of sampling; it deals rather with the method 
of measurement and with reasons why the method used will produce data 
that will satisfy a particular need. The method would be the same whether 
the survey were a complete coverage or a sample. 

In cross-examination the opposition may tempt the statistician beyond 
the sphere of his competence. The statistician must try to answer all 
questions politely and simply, yet he must stay within the limitations of 
his own ability and of the standard error. He certainly has a right to say 
he does not know the answer to a question that is beyond his compe- 
tence and beyond his direct knowledge. 

Although he can not testify to the inherent usefulness of the result, 
the statistician can certainly make it clear that he would not have associ- 
ated himself with the study had he not been sure in advance that it would 
be executed rigidly in conformance with his specifications, and that the 
methods of inspection, interviewing, and questioning, although beyond 
his qualifications, would be satisfactory and produce useful data. He may 
do this without professing to be an expert in the subject-matter, as he 
may declare that he has confidence in Mr. So and So (expert in the sub- 
ject-matter), who has testified, or will, concerning these things. 

This division of responsibility between the statistician and the expert 
in the subject-matter should not be difficult to explain, but it is easy to 
forget to do it; and still easier later on in cross-examination to be lured 
across the border into the subject-matter and into trouble. 

The following excerpt represents a statistician’s attempt in direct 
testimony to state what his job was; to put a limitation on his assign- 
mnt, and hence on what he could testify to in cross-examination. The 
case involved the use of samples of items of telephone plant, the aim 
being to obtain a figure for the average over-all per cent physical condi- 
tion of the entire property. 

Q.! Doctor, for what purpose were you engaged by the Illinois Bell Tele- 
phone Company? 
A. I was told that this company proposed to make a survey to determine the 


physical condition of its plant. They asked me to prescribe statistical 
procedures by which to select samples of items of plant for inspection, 





1 The Illinois Commerce Commission, Docket No. 39126, 1951, and Docket No. 41606, 1954: The 
Illinois Bell Telephone Company in the matter of the proposed advance in rates. The passage printed 
here is testimony prepared in advance, and is not necessarily the same word for word in the record. 
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such as poles, wire, cable, telephones, relays, central office equipment. 
The samples must determine within narrow limits of precision what re- 
sult would be obtained for the average over-all per cent condition by a 
complete 100% inspection of all the items in all the classes of plant that 
were to be inspected, with the same inspectors, and with the same care 
as was exercised on the samples, were such a thing possible. 

This assignment carried with it the responsibility for prescribing 
the procedures for summarizing the results for each class of plant, once 
the code-values assigned by the inspectors were translated into per- 
centages, and for combining the per cent conditions of the several classes 
of plant into the over-all average per cent conditions of all the classes 
that were to be inspected. A necessary part of the assignment was to 
provide procedures by which to calculate the standard error of the pre- 
cision of the result obtained for the over-all average per cent condition. 

My assignment did not include the responsibility for the procedures 
for inspecting any item, nor for the numerical values that translated the 
inspectors’ codes into percentages. Neither had I any responsibility for 
determining the weights of the various classes of property. These prob- 
lems are the same whether one uses sampling or not. These phases of the 
work have been described by Mr. Coxe (General Staff Engineer). 

. Does Company Exhibit No. 112 (Sampling Procedures for Drawing the 
Items of Property for Field Inspection) contain the procedures that you 
prescribed? 

A. Yes sir, it does. 


Later on came the following explanation of the standard error of the 
result: 


Q. What is your interpretation of the standard error of this study? 

A. The sampling precision of this study is expressed by the over-all stand- 
ard error, which turned out to be .19 per cent. This standard error is not 
a matter of opinion nor of expert judgment, but is objective, as it is 
calculated by the laws of probability from the results themselves. The 
interpretation of this standard error is simple: I may say with a high de- 
gree of assurance that the maximum uncertainty that one may attach 
to the over-all per cent condition because of the introduction of sampling, 
can not be, at the outside, more than three times the standard error. In 
other words, any uncertainty in the figure 74.5% (the final result) which 
can be attributed to the fact that the company used samples instead of a 
complete and total inspection of every item, with the same care as was 
exercised on the samples, were such a thing possible, can not exceed .57 
per cent. 


THE PERMANENCE OF THE STANDARD ERROR CONTRASTED WITH THE 
TEMPORAL CHARACTER OF ACCURACY AND USEFULNESS 


In a probability sample (the only type of survey to be considered here) 
the precision is calculable from the results, as I mentioned in the opening 
paragraphs. In practice, the size of the sample will be sufficient to provide 
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a firm estimate of the standard error. This was the case in the excerpt 
above. One may say with a high degree of assurance that in a long series 
of repetitions of this sampling procedure, only about 2.3 per cent of the 
results would fall 2 standard errors above the result of the complete 
coverage, and about 2.3 per cent of the results would fall 2 standard errors 
below. Practically none of the long series would fall beyond 3 standard 
errors either way. 

It is another thing, however, to say whether the complete coverage, 
were it possible, would produce useful information. The inherent ac- 
curacy of the method of measurement (the interviewing, the question- 
naire, the method of inspection), and the usefulness of the information, 
whether obtained by a complete coverage or by a sample, is a matter for 
the substantive expert to testify to, as explained earlier. 

The main difference between a sample and a complete count is that 
the sample possesses an error of sampling. The statistician testifies in 
regard to this. A sufficient degree of precision is necessary for the useful- 
ness of sample results, but it does not guarantee their usefulness. 

The inherent accuracy and usefulness of the procedures of measure- 
ment will change from time to time as the substantive experts develop 
new concepts of the kind of information that they require to solve new 
and changing problems. Anyone who has followed the changing concepts 
of the characteristics of the labor force, or the changing concepts of a farm, 
or of family-budget studies, or the changing concepts of the desirable 
characteristics of fibres and of textiles, will know that no definitions or 
methods of measurements stay fixed. 

In contrast, the standard error of a procedure of sampling remains 
fixed with time; likewise the interpretation of the standard error. The 
validity of the standard error does not depend on the economy or clever- 
ness of the design of the sample. It depends only on careful execution 
and on the rigid use of probability methods in accordance with some pre- 
scribed statistical plan. For this reason, the standard error of a sampling 
procedure and its interpretation, remain valid, even though new ac- 
vances in theory point the way to more economical sampling procedures 
by which to obtain the same standard error. 


REFERENCES 


There is apparently no previous literature that deals with the presenta- 
tion of modern statistical procedures and their results in legal evidence. 
Fortunately, however, sampling has received attention from the legal 
standpoint in a paper by Frank R. Kennedy, who supplied copious re- 
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marks and references to cases in which samples of one kind or another 
were offered in evidence.? 

In conclusion, it is a pleasure to acknowledge aid from a number of 
friends, chiefly from Mr. Melvin F. Wingersky, Attorney at Law, and a 
member of this Association. This acknowledgment is of special interest, 
because he cross-examined me on several occasions with great vigor. 
Valuable help came also from Mr. Harlow A. Coxe, General Staff Engi- 
neer of the Illinois Bell Telephone Company; also from Mr. Howard L. 
Jones, statistician, and from Mr. Gordon Winks, General attorney, all 
with the same company. Finally, I have had the benefit of a long asso- 
ciation and many conversations on this subject with Professors John W. 
Tukey and Frederick F. Stephan of Princeton. 


? Frank R. Kennedy, “Some legal aspects of sampling,” Industrial Quality Control, Vol. vii 
(January and March 1951). 








ACCURACY OF AGE REPORTING IN THE 1950 
UNITED STATES CENSUS 


Rosert J. Myers 
Social Security Administration 


NE common human error is to round figures even though precise 

results might be desired or requested. This is particularly evident 
in census returns where the age in integral years is sought. Particularly, 
does this arise for ages ending with the digit 0 and to a lesser extent 
frequently with digits 2, 5, and 8. This paper will investigate the extent 
of preference for certain digits of age in the 1950 United States census 
and will indicate the extent of improvement that has occurred since 
earlier censuses, as well as giving certain summary data for several 
other countries. The analysis will be carried out using the “blended” 
method.! 


DESCRIPTION OF METHOD OF ANALYSIS 


One method for showing the degree of preference for certain digits 
of age consists of starting at a given age, say 20, and adding up the 
population for all ages ending in 0, all ending in 1, etc. Then the popu- 
lation at each digit is expressed as a percentage of the total population; 
any considerable deviation from 10 per cent would be taken as indica- 
tion of bias in age reporting for that particular digit. This procedure, 
however, does not yield truly valid results since it is not proper to sim- 
ply add the overall populations at each digit starting at a particular 
age because then the “leading” digits naturally occur more frequently 
among the persons counted than the “following” ones. 

The “blended” method overcomes this objection by allowing each 
digit in turn to be the “initial” one. The ten separate results are then 
summed, and a percentage distribution by digit is computed. The 
justification for this method is largely empirical, based on general 
reasoning and logic, with the further point that it produces proper 
results for smooth, life table data (i.e., shows no digit preference). 

As an example of how the “blended” method operates, when the 
count is started at age 20, the population considered at unit digit 0 is 
the sum of those at ages 20, 30, 40, etc. For the nine cases when the 
count is started successively at ages 21 to 29, the population considered 
at digit 0 begins with that at age 30 instead of age 20. Correspondingly, 
as to the population at digit 1, when the count is started in turn at 





1 See Robert J. Myers, “Errors and bias in the reporting of ages in census data,” Transactions of 
the Actuarial Society of America, XLI (1940). Reproduced in Handbook of Statistical Methods for De- 
mographers, Bureau of the Census, 1951. 
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ages 20 and 21, included are ages 21, 31, 41, etc., while when the count 
is started at any of ages 22 to 29, included are ages 31, 41, etc.? 

The result of these calculations then is a percentage distribution of 
the population at each of the 10 digits. If no heaping were present, 
each figure would be very close to 10 per cent. Conversely, any sizable 
deviation from 10 per cent indicates the presence of such inaccuracy. 
A relative index of the amount of preference of age for any census dis- 
tribution can be obtained by summing up the absolute deviations from 
10 per cent in each case. 

Bachi? has suggested a somewhat preferable index, which amounts 
to half the previously described index’ and which will hereafter be used 
as the index of heaping. This index has a certain significance since, as 
Bachi says, it “estimates the proportion of persons in the population 
who return their ages with an inaccurate unit digit and thus has the 
advantage of being more easily understood.” Bachi goes on to take the 
extreme case where all people report the same unit digit in which case 
the index would be 90 per cent indicating that 90 per cent of the people 
returned inaccurate unit digits. 

In actuality, Bachi’s index more properly indicates the minimum 
proportion of persons returning their ages with an inaccurate unit 
digit since certain errors may be self-cancelling. Thus, taking the com- 
mon case where digit 0 is over-reported, there may be some persons 
truly having an age ending in 0 who report some other age; these 
persons are, of course, far more than offset by those who inaccurately 
report themselves at an age ending in 0. In the extreme case, of course, 
there might be 10 per cent of the persons reported at each of the 10 
digits of age, which would yield an index of 0; yet it is theoretically 
conceivable that every person has returned his age with an inaccurate 
unit digit, but by chance there has been complete offsetting. At any 
rate, however, the use of the index as developed by Bachi seems prefer- 
able because it does have a certain real meaning. 


ANALYSIS OF U. 8. CENSUS DATA 


Table 1 shows the preference for digits of age in the total United 
States population for various censuses. Considerable heaping at digits 
0 and 5 occurred in the past, although there has been much improve- 
ment in the past 70 years. Thus, in 1880, digit 0 showed a relative 





2A very similar method of analysis, which in practice produces very much the same results, was 
developed independently by Roberto Bachi in “Measurement of the tendency to round off age returns,” 
Proceedings of the International Statistical Congress, Rome, 1953. 

* Or in other words, is based on only the preferred, or overstated, digits (or conversely, only on the 
disliked, or understated, digits). 
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excess of 68 per cent over the normal proportion of 10 per cent, thoereti- 
cally to be expected, whereas by 1950 this excess amounted to only 12 
per cent. The heaping at digit 5, which in 1880 was half as large as 
that at digit 0, has by 1950 virtually vanished. Throughout the period 
a slight heaping at digit 8 has apparently been present. The greatest 
understatement has occurred for digit 1, thus indicating that the heap- 
ing at digit 0 seems to be due to digit 1 rather than digit 9. The index 
of preference has decreased steadily from more than 10 in 1880 to 
almost 2 in 1950. 


TABLE 1 


PREFERENCE FOR DIGITS OF AGE IN THE TOTAL CONTI- 
NENTAL UNITED STATES POPULATION FOR VARIOUS 
CENSUSES, POPULATICN AT EACH DIGIT OF AGE 
AS PER CENT OF TOTAL POPULATION* 














Digit of Census 

Age 1880 1890 1900 1910 1920 1930 1940 1950 
0 16.8 15.1 13.2 13.2 12.4 12.3 11.6 11.2 
1 6.7 7.4 83 7.7 8.0 80 85 8.9 
2 9.4 9.7 9.8 10.2 10.2 10.3 10.4 10.2 
3 86 9.1 9.3 92 9.4 94 96 9.7 
4 88 9.0 9.5 9.4 9.4 96 9.7 9.7 
5 13.4 12.3 11.3 11.5 11.38 11.2 10.7 10.6 
6 94 96 9.4 96 9.7 96 96 9.8 
7 85 89 93 9.1 9.4 93 96 9.7 
8 10.2 10.4 10.2 10.7 10.6 10.5 10.3 10.2 
9 82 85 9.7 9.4 9.6 9.8 10.0 10.1 

Total 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 

Index os Oe ae 6h ' 43° 648 “Sh 889 





*® The method of analysis is the “blended” method as described in the text, using starting ages 
23 to 32 and ending at age 99 in all cases. These percentages, in effect, relate the reported population 
at each digit of age to the “true” population. 

b Based on 20 per cent sample data. 

© The index is one-half the sum of the deviations from 10.0 per cent, each taken without regard to 
sign. These indexes, in effect, indicate the minimum net proportion of persons who return their ages with 
an inaccurate unit digit. 


Table 2 indicates the variation in preference for digits of age by sex, 
race, and nativity for the 1950 census. For native-born whites, there is 
very little preference for any particular digit of age although there is 
a small heaping at digit 0, and to some extent at digit 5. The index of 
preference is at the very low level of 1.3 for native-born white men 
although being somewhat higher for women of the corresponding group. 
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The foreign-born whites show greater inaccuracy in the reporting of 
ages than native-born whites, with the index of preference being about 
twice as high, while for nonwhites, the index is about 4 times as high 
as for native-born whites. 


TABLE 2 


PREFERENCE FOR DIGITS OF AGE BY RACE AND SEX IN 
1950 CENSUS OF CONTINENTAL UNITED STATES 
POPULATION AT EACH DIGIT OF AGE AS PER 
CENT OF TOTAL POPULATION®* 
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® The method of analysis is the “blended” method as described in the text, using starting ages 23 
to 32 and ending at age 99 in all cases. These percentages, in effect, relate the reported population at 
each digit of age to the “true” population. Based on 20 per cent sample data. 

> The index is one-half the sum of the deviations from 10.0 per cent, each taken without regard to 
sign. These indexes, in effect, indicate the minimum net proportion of persons who return their ages 
with an inaccurate unit digit. 

There is considerable evidence of greater accuracy of reporting in the 
1950 census since the indices for all categories are considerably lower 
than in previous censuses (see Table 3). In each category and for each 
census, the index of preference is lower for men than for women, with 
the relative differential being about 4 for native-born whites although 
generally somewhat less than this for foreign-born whites and non- 
whites. 


ANALYSIS OF CENSUS DATA FOR OTHER COUNTRIES 


Some indication of the relative accuracy of the reporting of digits 
of age in the 1950 United States census may be obtained by considering 
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TABLE 3 


INDICES SHOWING PREFERENCE FOR DIGITS OF AGE, BY 
RACE AND SEX, CONTINENTAL UNITED STATES, 
1930, 1940, AND 1950 CENSUSES* 








Census 





Sex and Race 
1940 





Men, Native-born White 
Men, Foreign-born White 
Men, Non-white 


Women, Native-born White 
Women, Foreign-born White 
Women, Non-white 12. 





® The method of analysis is the “blended” method as described in the text, using starting ages 23 
to 32 (except for white population in 1940, for which ages 35 to 44 were, of necessity, used) and ending 
at age 99 in all cases. The index, in effect, indicates the minimum net proportion of persons who return 
their ages with an inaccurate unit digit. 

b Based on 20 per cent sample data. 


the indices of preference in recent censuses in other countries where it 
would be expected that, because of a high degree of literacy, good re- 
porting would be obtained. Data by single years of age sufficient to 
make such an analysis are available for Australia (1947), Canada 
(1951), and Great Britain (1951), with the results being as follows: 








Index of Preference 





Country 
Men Women 





Australia 

Canada 

Great Britain 

United States, Total 
United States, White 





As contrasted with the other three countries, the index of preference 
for the United States is significantly higher, especially for women. 
However, when only the white population of the United States is con- 
sidered, the index for men compares quite favorably with those for the 
other countries, but for women the United States index is significantly 
higher. In fact, it is pertinent to note that for the other countries, there 
is relatively little difference in the accuracy of reporting of ages as 
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between men and women, whereas for the United States women defi- 
nitely do not report as accurately as men. 

In both Canada and Great Britain, the only significant evidence of 
heaping is for digit 0, but this is of relatively minor significance reper- 
senting an overstatement of at most 10 per cent relatively. For Aus- 
tralia, the situation is somewhat different since there is only a slight 
indication of heaping at digit 0; in fact, there is evidence of a certain 
amount of heaping at digit 7. This peculiarity possibly arises because 
the census was taken in 1947, and the question on age was framed so 
as to ask for year of birth. Accordingly, a very sizable number of per- 
sons reported the “round” year, 1900 (the number shown at age 47 
being 10 per cent greater than the average at ages 46 and 48). 


SUMMARY AND CONCLUSIONS 


The accuracy of the reporting of ages in the 1950 United States 
census has been in accord with the trend of steady improvement pre- 
vailing over the last 70 years. For certain groups, especially native- 
born white males, age reporting now, at least insofar as preference for 
digits of age is concerned, has reached almost as great accuracy as can 
ever by expected. There is, however, significant room for improvement 
in the nonwhite population. Furthermore, the reporting of ages by 
women is significantly less accurate than for men despite the fact that 
in various foreign countries there is little difference between the sexes 
as to accuracy of age reporting. 











VALIDATION OF MORBIDITY SURVEY DATA BY 
COMPARISON WITH HOSPITAL RECORDS* 


Nepra B. BELLoc 
California State Department of Public Healtht 


OUSEHOLD sample surveys are being used increasingly! to obtain in- 

formation on the status of the health of the population. Physi- 
cians, and others who are to use many of the data so collected, are 
likely to raise the question: When lay interviewers, no matter how care- 
fully trained, question lay respondents on a subject as complex as 
illness, are the results sufficiently accurate to justify the continued use 
of this method of measuring morbidity? 

Validation of the measures of morbidity by comparison with an 
independent criterion can be done only if the person who reported 
illness received some medical service. Reports of absence from school 
or work are not necessarily proof of illness. This means that a large 
proportion of illnesses reported in household surveys are not subject to 
verification, since they are not medically attended. 

In almost all of the earlier illness surveys some attention was paid 
to the assessment of the accuracy of the diagnostic information ob- 
tained [7, 11, 13]. In some cases the effort was directed at “improving” 
the diagnoses given by respondents [16, 17] by the substitution of 
medical reports for those given in the surveys. Only in the National 
Health Survey [8, 14] were sufficient data presented to enable an evalu- 
ation of the extent of agreement between the family’s and the physi- 
cian’s diagnoses. In that study, however, as in several of the others 
[2, 5, 6, 10], the diagnoses as given by the family were submitted to 
the physician for confirmation or change, creating an obvious preju- 
dice in favor of agreement of the diagnosis. 

The method used in these surveys has been the checking of a survey 
report against a corresponding physician record or hospital record. The 
degree of agreement has been described as a percentage of records in 
which the diagnosis agreed out of those which were checked. Another 





* This investigation was supported by a research grant (RG-1792) from the National Institutes of 
Health, U. 8. Public Health Service. Presented at the 113th Annual Meeting of the American Statistical 
Association, Washington, D. C., December 29, 1953. 

t+ Acknowledgment is made of the planning and preliminary work on this study which were done 
by Arthur Weissman. 

1 Such studies as the Pittsburgh Arsenal Health District Studies, Canadian Sickness Survey, surveys 
sponsored by the Commission on Chronic Illness in Hunterdon County and Baltimore, the Special 
Research Project of the Health Insurance Plan of Greater New York, as well as the California Morbidity 
Research Project. 
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aspect of the validity of survey data, that of completeness of reporting 
of the fact of illness, has been virtually ignored. 

It is to be noted that, while it is valuable in some ways, the matching 
of individual records with criterion sources does not give a statistical 
measure of bias. Individual reports might differ considerably from the 
corresponding reports in the criterion source, and yet the errors might 
compensate in such a way that the over-all description of morbidity 
given by the two sources could be identical. The statistician is inter- 
ested in a test of validity which will enable him to answer the question, 
“Does the measure of morbidity obtained by a household survey differ 
significantly from that obtained by reference to medical records?” 

Comparison with “criterion sources” can be made without the impli- 
cation that these criteria are more accurate than the survey. In some 
cases, it is quite probable that the household survey reports are more 
complete than the medical records. If the completeness of reporting in 
the household survey is to be measured, however, it is necessary to have 
a criterion source which is in itself extremely accurate. Otherwise, the 
household survey will produce a large number of over-reports solely 
because of the inadequacy of the criterion. 

This paper will present some of the results of the validation checks 
with records of hospitalization which were done in a survey undertaken 
by the California State Department of Public Health in San Jose in 
the spring of 1952 [1, 18]. The method used was to collect abstracts of 
records from the hospitals serving the City of San Jose for all persons 
resident in the city, and then to locate in this file of abstracts the rec- 
ords of hospitalization for all persons in the household sample survey. 
The two sets of reports thus obtained, from the household survey and 
from hospital records, form the bases for the computation of various 
measures. The net differences? between these two sets of statistics are 
examined in this report.* 

The measurement of hospitalized illness by securing records from 
hospitals for a sample of the population required intensive work with 
the hospitals, and would not be practical except in a limited area such 
as San Jose. This study could not have been done without the patience, 
interest, and whole-hearted cooperation of administrative and medical 
record personnel in the participating hospitals. 





* Marks and Mauldin [12] have classified errors in surveys as of three types: sampling, response, 
and processing. Since our survey reports are processed data, the error being studied here, while primar- 
ily response error, includes errors of processing. 

3 Since there is some interest in the extent of agreement on “matched cases,” the method used in 
other surveys for validation, footnote references are made to analyses made by this method also. 
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Data on Hospitalized Illness Obtained from Hospital Records 


For the seven months prior to the beginning of the survey, and for 
the five months of the survey, the four general hospitals and the State 
mental hospital in the area prepared abstracts of the records of pa- 
tients living in San Jose.‘ These abstracts included the name, age, sex, 
address, admission date, discharge date, days in hospital, surgery per- 
formed, and admission and final diagnoses. The hospitals permitted 
checking of their records by members of the staff of the Project to in- 
sure a complete file of abstracts. These abstracts were filed by Soundex 
code of the surname, and the final diagnoses were coded according to 
the International Statistical Classification of Diseases, Injuries and 
Causes of Death. [9] 


Data on Hospitalized Illness Obtained in Household Survey 


In the initial interview, household respondents were asked whether 
any member of the household had been a patient in a hospital overnight 
or longer during the 12 months preceding the month of interview. If 
so, data were obtained on name of hospital, month of admission, length 
of stay in nights, operations performed, and diagnosis. 

Control cards for all families in the survey were made from the inter- 
view schedules, showing name, age, sex, and address (previous ad- 
dresses were included where given). These cards, which did not contain 
any illness data, were filed according to the Soundex code of the sur- 
name. 


Matching of Reports of Hospitalization in Survey and Hospital Records 


For the initial matching operation, it was desired to locate all hos- 
pital records for all individuals included in the survey. This was done 
by two clerks who went through the file of control cards, systematically 
searching the file of hospital abstracts for persons with matching or 
similar names, ages, and addresses. Differences of two years or less in 
age were considered matching, and variations of one letter in name 
were disregarded. After the first check, a recheck discovered only three 
more matching cases. 

Hospital records for “matched” persons were compared with the 
information obtained in the household survey. In a large proportion of 





¢ Nearly 25 per cent of the hospitalizations of residents of San Jose occurred in hospitals outside 
the area. Almost half of these were by persons who reported residing in another city at some time during 
the year preceding the survey. The remainder were in hospitals in nearby cities, including Veterans’ 
Administration facilities, the Southern Pacific Hospital, University of California Hospital, etc. The 
hospitalizations outside the area may be different in duration and type from those cases in the immediate 
area, so that rates given here do not represent measures of hospitalization for the population. 
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cases, the period of hospitalization was clearly the same in both 
sources, in spite of certain discrepancies in reported date of admission, 
length of stay, or diagnosis. In some cases, however, there were hos- 
pital records of illness which had not been reported in the household 
survey (these we call possible “under-reports”); other cases were re- 
ported in the household survey but not located in our file of hospital 
record abstracts (these we call possible “over-reports”). That is, wnder- 
and over-reporting of survey information are with respect to the cri- 
terion. 

Possible over- and under-reports were subjected to further search. 
Variations in name were considered, and the telephone directory some- 
times verified the fact that two individuals (or families) of the same 
name lived at different addresses in San Jose. For some cases, the name 
of the nearest relative was secured from the hospital to serve as a fur- 
ther means of identification. In this process, five additional matched 
cases were discovered. 

Of some interest may be the degree to which a match was secured on 
the items used for check purposes. Table 1 summarizes the results of 


TABLE 1 


NUMBER OF MATCHED HOUSEHOLD SURVEY REPORTS AND 
HOSPITAL RECORDS AND NUMBER OF HOUSEHOLD SURVEY 
UNDER-REPORTS BY MATCHED ITEMS 




















Matched Items Matched Records Under-Reports 
Tora. 249 39 
Name, age, address 221 33 
Name, age 20 2 
Name, address* 6 2 
Age, address 2 2 
See text for definition of terms. 


* Includes those in which age was not stated in the survey. 
the matching operation with respect to records of hospitalization in the 
five hospitals subsequent to July 1, 1951. 
Under-Reporting and Over-Reporting of the Event of Hospitalization in 
the Household Survey 


A total of 279 periods of hospitalization were reported in the house- 
hold survey in the five hospitals with discharge date after July 1, 1951.5 


5 Persons in the city segments of the household survey reported 493 hospitalizations in the year 
preceding the month of interview. Of the 374 in the five hospitals in the area, 95, or 25 per cent, were 
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Of these, 249, or 89 per cent, were matched with hospital records. 0: 
the 30 (11 per cent) “over-reports,” i.e., reports in the household survey 
which were not matched with hospital records, 20 were identifiable at 
the hospital which was named. In eleven of these cases, the individuals 
had reported as during the survey period a hospitalization which actu- 
ally occurred as long as one year earlier, and in seven cases, hospitaliza- 
tion had been reported in the household interview as overnight or 
longer when the hospital record showed discharge on the same day as 
admission. Table 2 shows a summary of over- and under-reporting. 


TABLE 2 
REPORTS OF HOSPITALIZATION IN HOUSEHOLD SURVEY AND 
IN HOSPITAL RECORDS FOR RESIDENTS OF SAN JOSE IN 
SAMPLE, JULY 1, 1951-MAY 31, 1952* 
WITH NUMBER AND PER CENT MATCHED AND WITH 
CLASSIFICATION OF THOSE NOT MATCHED 








Category of Report Number Per cent 





Reports of hospitalization in household survey 279 
Matched with hospital records 249 
Not matched with hospital records (over-reports) 30 

Identified at hospital 20 
Stay prior to check period 11 
Stay not overnight 7 
Record of this person, but not of this hospital- 

ization 

Not identified at hospital 


Hospitalizations in hospital records for survey popu- 
lation 
Matched with household survey reports 
Not matched with survey reports (under-reports) 
Multiple admissions, not all admissions reported 
in survey 
Reported in survey, but not during check period 
Admissions to state mental hospital 
Other types of under-reports 1 











* Check period varied from 7 to 11 months for the different subgroups in the sample. 


A total of 288 periods of hospitalization were shown in hospital rec- 
ords for persons in the household survey in the period subject to check. 





for discharges prior to July 1, 1951, the beginning point of the time when hospital discharge records were 
available. The periods of time covered by the reports subject to check varied from 7 to 11 months, 
depending on the month in which the initial interview was taken. While episodes of hospitalisation in the 
recent past were reported more accurately than those which occurred nearly a year before, differences in 
the range above 7 months were not significant. It is believed that no appreciable error was introduced 
by the use of the varying period subject to check. 





Saicahenel iad 


VALIDATION OF MORBIDITY SURVEY DATA 837 


Of these, 39, or 14 per cent, were not reported by the respondents in 
the household survey. In 40 per cent of these cases, persons in the sur- 
vey had more than one admission in the period, and reported at least 
one other.* In another 20 per cent, the period of hospitalization was 
reported, but the date given was such that it did not fall within the 
period subject to check. There was failure to report four stays in the 
State mental hospital. (Two such hospitalizations were reported and 
matched among the 249 appearing in both survey and records.) 


Comparison of Common Measures of Hospitalization as Derived from 
Household Sample Survey and from Hospital Records 
The 279 periods of hospitalization reported in the household survey 
and the 288 periods disclosed in hospital records for the same popula- 
tion and period of time form the basis for the computation of a number 
of common measures of hospital utilization shown in Table 3. 
TABLE 3 


MEASURES OF HOSPITALIZATION FROM SAN JOSE HOUSEHOLD 
SURVEY AND FROM HOSPITAL RECORDS 








From 
Survey 
Reports 

(A) 


From 
Hospital 
Records 

(B) 





Admissions per 1000 persons per year* 

Days of hospitalization per person per year 

Average length of stay per period of hos- 
pitalization in days 

Per cent of admissions with surgery 


65.5 
-609 


9.1 
43.4 


67.9 
-655 


9.5 
44.4 


. 
H 
t 
é 
I 
| 
i 
i 
; 
E 














Note: Because the hospitalizations upon which these rates were based were in five hospitals only, 
these should not be considered to represent true rates of hospitalization for the population covered. 


° 2 Admissions in period covered 


X 12,000. 
2 Person-months covered by survey 





When respondents in a household sample survey were asked about 
periods of hospitalization in the year preceding the survey, the informa- 
tion which they gave yielded an admission rate of 66 per 1000 persons 
in five specified hospitals. When the records of these five hospitals 
were searched for the names of the persons in the household survey, 
records were disclosed which yielded an admission rate of 68 per 1000 
persons. The difference between these two rates is not significant at the 


* In all but two of these the unreported period of hospitalization was for the same condition as the 
period which was reported. 
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five per cent level. (In all tests of significance in this paper, the five 
per cent level was used.) 

Similarly, the days of hospitalization per person per year, the average 
length of stay, and the per cent of cases with surgery, were all slightly 
higher when obtained from hospital records than when obtained from 
the household survey. None of these differences was statistically signifi- 
cant.” 

The difference in the average length of stay, 9.1 from survey data 
and 9.5 from hospital records, was accounted for by several long peri- 
ods of hospitalization which were not reported in the survey. When the 
239 cases in which length of stay was reported in both sources are com- 
pared, the averages become 9.2 in the survey and 8.6 from hospital 
records.* There was some indication in the data that persons in the 
survey tended to over-report longer stays to a greater extent than 
shorter stays. However, when the differences between reports from 
household survey and hospital records for stays of 15 days or less were 
compared with the differences for stays of 16 days or longer, using a 
t-test which takes into account the differences in variance at the two 
ends of the scale, this tendency proved to be not significant at the five 
per cent level [3]. 

Another way in which we may test the usefulness of household sur- 
vey data in the reporting of hospitalization is by comparing the dis- 
tributions of some of the items as reported in the household survey 
with the distributions obtained by reference to hospital records. Such 
comparisons have been made for the month of admission to hospital, 
length of stay, surgical procedure, and diagnosis. 

Table 4 presents the month of admission of the periods of hospitaliza- 
tion as reported by the household survey and as obtained by the check 
of hospital records. Using the chi-square test, the difference between 
these distributions was not significant.°® 

Of some concern to medical care plans and hospital administrators 
is the distribution of cases by length of stay and the proportion of 
total days which are accounted for by stays of various lengths. The 
comparison of the household survey reports and hospital records as to 





7 Since some stays were over 200 days, the standard deviations of the distributions of stays were 
considerably greater than the means. 

8 It is to be noted that the household survey questions used should yield information differing from 
information obtained from hospitals regarding length of stay. The “number of nights” reported in the 
survey, if accurately reported, will always be equal to or one day less than the number of days of hos- 
pitalization shown in hospital records. The median length of stay was 4 in both distributions. 

* There was agreement on the reported month of admission in 80 per cent of the matched cases. 
Discrepancies of one month appeared in fifteen per cent of the cases, and of more than one month in the 
remaining five per cent. 
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length of stay is shown in Table 5. The distribution of cases shown in 
the first column of Table 5 is not significantly different from the dis- 
tribution shown in the second column.!° The degree of precision which 
is avsired in the reporting of total days of hospitalization would de- 
pend, of course, upon the uses to which the data are to be put. For 


TABLE 4 


279 REPORTS OF HOSPITALIZATION FROM THE SAN JOSE 
HOUSEHOLD SAMPLE SURVEY AND 288 FOR THE 
SAME POPULATION FROM HOSPITAL RECORDS, 

BY MONTH OF ADMISSION 








Survey Hospital 


Month of Admission Reports Records 





Tora. 279 288 


Prior to July, 1951 12 11 


July 31 34 
August 33 38 
September 30 35 
October 23 22 
November 35 34 
December 29 26 


January, 1952 31 34 
February 19 16 
March 19 17 
April 7 7 
May 7 7 








Not specified or not available 3 7 





most practical purposes, however, it would seem that the distribution 
of days reported by the two methods are equally useful. It is to be 
noted, for example, that in the household survey 71 per cent of the total 
days of hospitalization were reported in stays of 60 days or less, while 
the corresponding figure by hospital records was 68 per cent. 

When individual household survey reports are compared with the 
corresponding hospital records, it is found that among 239 records for 
which the item of length of stay was complete in both sources, there 
was exact agreement in 127 cases, or 53 per cent. In 65 cases, or 27 
per cent, the survey report was greater than the hospital record, and 





10 When grouped into 16 categories, chi-square = 12. 
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in 47 cases, or 20 per cent, the survey report was less than the hos- 
pital record. The difference between the number of cases reported in 
the survey as more than and as less than the hospital record is not sig- 
nificant at the five per cent level, using the sign test [4]. It is to be 
remembered, however, that the definitions of length of stay differed, 


TABLE 5 


DISTRIBUTIONS OF PERIODS OF HOSPITALIZATION AND DAYS 
OF HOSPITALIZATION AS SHOWN IN THE SAN JOSE 
HOUSEHOLD SURVEY AND IN HOSPITAL RECORDS, 

BY LENGTH OF STAY IN DAYS 











Member of Number of Cumulative Percentages 


Length of Stay Periods Days* 
in Days* Periods Days* 








Survey Hospital| Survey Hospital| Survey Hospital 





279 2484 


34 
62 
120 
160 


78 
77 
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wpe OmN Sm 
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64 
81 
80 
44 
48 
26 
70 


— 


Dh ROORwW 
anouoon 
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DnOSSOwHHO 
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8 
9 
8 
4 
4 
2 
5 


— 
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233 

263 159 

180 277 
857 


31-60 
61 & over 
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S388 SESE SESee- 
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— 
— 


Not stated 7 


a 
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* In the household survey reference was made to “nights in hospital.” 


and that a report in the survey might be accurately reported as one 
day less than the hospital record. If half of the household survey re- 
ports which were only one day less than the corresponding hospital 
record are considered as matching, the data show a tendency, which 
is statistically significant at the five per cent level using the sign test, 
for over-statement in the household survey.“ 

One item of information gathered in the survey concerned the opera- 


1) With this assumption, cases in which there was agreement would total 144, and cases in which 
the survey report was less than the hospital record would total 30. 
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tion(s) performed during a stay in the hospital. These items were 
coded according to the first digit of the operation code in the Standard 
Nomenclature [15]. Table 6 gives a summary of the reports from the 
two sources. 
TABLE 6 
DISTRIBUTION OF SURGICAL PROCEDURES IN 279 REPORTS 


OF HOSPITALIZATION FROM THE SAN JOSE HOUSEHOLD 
SURVEY AND 288 FROM THE HOSPITAL RECORDS 








Hospital 
Surgical Procedure oe a 


Records 





TOTAL 279 288 


Surgery not stated or record not available 
Without surgery 


With surgery 
Incision 
Excision 
Amputation 
Introduction 
Endoscopy 
Repair 
Destruction 
Suture 
Manipulation 
Not classifiable above 1 











* Twelve hospital records showed two procedures and one showed three. In these cases the pro- 
cedure which matched the household survey report was counted. 


In coding these procedures such terms as “sinus operation,” “oper- 
ated anorectal,” “kidney operation,” and “general surgical work” ap- 
peared in more than ten per cent of the household survey reports. 
These terms were not classifiable in the system used above. However, 
descriptions which probably referred to only one procedure, such as 
“hernia operation” (repair), were given the appropriate code. 

It is apparent from inspection of Table 6 that the large group in the 
“Not classifiable” category for the household survey reports makes the 
picture quite different from that shown by the hospital records. This 
difference is statistically significant. Apparently, then, household sur- 

12 The distributions in Table 6 were grouped into four categories, Incision, Excision, Repair, and 


Other, and the chi-square test was applied. 
Note on Matched Cases: There were 240 cases in which a report of surgery was available in both 
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vey reports do not give as specific descriptions of surgical procedures 
as can be secured from hospital records. 

Diagnoses given in the household survey and those on the hospital 
records were coded according to the 3-digit codes of the International 
Statistical Classification of Diseases, Injuries and Causes of Death. In 
both types of records, the diagnoses were coded in order of mention 
except that injuries were given precedence over other conditions. In five 
per cent of the survey reports there was more than one diagnosis, while 
there were multiple diagnoses in nearly nineteen per cent of the hospital 
records. 

When the primary diagnoses are distributed according to a 100- 
group category system, there are, of course, many categories in which 
there are few cases. These distributions are shown in Table 7. In order 
to test the significance of the difference here, the frequencies were 
grouped into 24 categories such that in no category were there less 
than five cases in the hospital records. Differences between the two 
distributions were not significant. A further consolidation of the dis- 
tributions into eleven groups increased the degree of correspondence." 

It would appear that for most practical purposes, a description of the 
diagnoses of hospitalized illness to be obtained from a household survey 
like the one conducted in San Jose will be as useful as one obtained 


through reference to hospital records. 


SUMMARY 


In summary, most of the medical record sources which are available 
for validation checks on household survey reports of illness do not pro- 
vide an opportunity for a comprehensive check on both the over- and 
under-reporting of illness. Only with respect to hospitalized illness was 
it possible to study the net error in reporting. 

In this study, reports of hospitalization during a preceding period 
ranging from 7 to 11 months obtained by interview of households in 





sources. In 133 of these (which included episiotomies and stitches incident to normal deliveries) there 
was a report of no surgery on both records. On one record the survey report was “Broke right ankle. 
Surgery: None”; while the hospital record showed “Fracture of the int. and ext. trimalleolar of right 
ankle, Surgery: Reduction of fractures.” Here, obviously the respondent’s concept of what constitutes 
surgery differed from that generally understood. In two other cases in which the survey reported surgery 
while the hospital record showed none, it appears that the hospital record may have been in error. (A) 
Survey report: “Osteomyelitis in left arm, operated on left arm for drainage.” Hospital record: “Osteo- 
myelitis left radius, Surgery: No.” (B) Survey report: “Adhesions. Female trouble due to hysterec- 
tomy. Adhesions removed.” Hospital record: “Adhesions cecum and ascending colon. Rt. Ovarian cyst 
with multiple varicosities. Adhesions bands intestine two to lower pelvic area, Surgery: No.” 

In 104 cases there was a report of surgery in both sources. In 88 of these, or eighty-five per cent, 
there was agreement as to the surgical procedure when classified into ten categories according to the 
Standard Nomenclature. 

13 See footnote on Table 7. 

44 In the 242 cases for which diagnoses were available both from the household survey and from the 
hospital record, there was agreement between one pair of diagnoses at the 800-group level in 61 per cent 
of the cases. At the 100-group level, this agreement increased to 76 per cent, and when the data are ar- 
ranged into 15 groups, the agreement was 85 per cent. 
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San Jose, California, were checked against hospital records in five 
hospitals. Records were matched for 249 periods of hospitalization. 
Thirty reports in the survey could not, be matched with hospital rec- 
ords. These over-reports included hospitalization which was not over- 
night, stays which occurred as long as one year earlier than the study 
period, as well as ten which could not be identified at the named hos- 
pital. Thirty-nine periods of hospitalization were not reported by the 
respondents in the survey. Nearly half of these under-reports were for 
persons with multiple admissions, who reported at least one other 
hospitalization. Four were unreported admissions to the State mental 
hospital. 

Admission rates based on survey reports did not differ significantly 
from those based on hospital records for the same population. Similarly, 
days of hospitalization per person per year, average length of stay per 
period of hospitalization, and per cent of admissions with surgery were 
calculated accurately from household survey data. 

Distributions of admissions from household survey reports of hos- 


TABLE 7 


DISTRIBUTION OF SOLE OR PRIMARY DIAGNOSES IN 279 
REPORTS OF HOSPITALIZATION FROM THE SAN 
JOSE HOUSEHOLD SURVEY AND 281* FROM 
THE HOSPITAL RECORDS 








Number of Per cent of 
Hospitalizations Total 





Diagnostic Category? 
Survey Hospital| Survey Hospital 
Reports Records | Reports Records 





ToraL 279 281* 100 100 


Infective and parasitic diseases 10 7 3.6 2.5 
Tuberculosis of respiratory system 2 1 
Food poisoning 1 0 
Acute poliomyelitis, infectious encephalitis 

and late effects 4 
Other infective and parasitic diseases 3 


4 
2 


Psychoneuroses, mental disorder and ill-defined 
nervous conditions 
Mental, psychoneurotic and personality 
disorders 
Epilepsy 


Diseases of eyes 
Other inflammation of eye 
Cataract 
Other diseases of eye 


Rheumatic fever, arthritis, muscular rheuma- 

tism and sciatica 

Rheumatic fever and chronic rheumatic heart 
disease 

Arthritis, not elsewhere classified 
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TABLE 7—(continued) 





Number of 
Hospitalizations 





Survey Hospital 
Reports Records 





Diseases of circulation and symptoms referable 
to it 
Arteriosclerotic heart and coronary disease 
Other diseases of heart 
Hypertensive disease 
Diseases of arteries 
Varicose veins of lower extremities 
Haemorrhoids 
Other diseases of circulatory system 
Symptoms referable to cardiovascular and 
lymphatic system 


Colds, influenza and acute respiratory infec- 
tions 
Other acute upper-respiratory infections 
Influenza 


canwonanld 
manwmmwuas 


~ 
— 


ne 


Other respiratory diseases and symptoms 
Pneumonia 
Hypertrophy of tonsils and adenoids 
Chronic sinusitis and nasal diseases 
Pleurisy, empyema and lung abscess 
Symptoms referable to respiratory system 


re DS Oan 


Disorders of upper gastro-intestinal tract 
Ulcer of stomach 
Ulcer of duodenum 
Other disorders of stomach and duodenum 
Symptoms referable to upper gastro-intes- 
tinal tract 


So oe & 


~ 
° 


Disorders of lower gastro-intestinal tract 
Appendicitis 
Hernia and intestinal obstruction 
Gastro-enteritis and colitis 
Cholelithiasis and cholecystitis 
Other disorders of digestive system 


ownrwns 
aar8 SS 


Disorders of genito-urinary system and compli- 
cations of childbearing 
Nephritis 
Calculi of urinary system 
Other diseases of urinary system 
Hyperplasia of prostate 
Other diseases of male genital organs 
Diseases of breast 
Disorders of menstruation and menopause 
Other diseases of female genital organs 
Complications of pregnancy, childbirth 
and puerperium 
Symptoms referable to genito-urinary sys- 
tem 


co FY Feowwown8 
a) 
® SwoocewaccS 


~ 
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TABLE 7—(continued) 





¢ Number of Per cent of 
Hospitalizations Total 





Survey Hospital | Survey Hospital 
Reports Records | Reports Records 





Diseases of skin and skeletal system, malforma- 
tions 10 14 3.6 5.0 
Other infections of skin and subcutaneous 
tissue 
Other diseases of skin and subcutaneous 
tissue 
Osteomyelitis and periostitis 
Other diseases of bone 
Other diseases of joints except ankylosis 
Other acquired musculoskeletal deformities 
Congenital malformations 


140-299, | Other diseases and symptoms 

330-834, Neoplasms 

340-3652, 

354-368, 

364-869, 

760-776, Other allergic, endocrine, metabolic and 

787-789, blood diseases 

792-7965 Vascular diseases affecting central nervous 
system 

Other diseases of central nervous system 

Diseases peculiar to early infancy 

Symptoms referable to limbs and back 

Other and ill-defined symptoms and condi- 
tions 


N800-999 | Injuries 

Fractures 

Dislocations, sprains and strains 

Interval injury of chest, abdomen, pelvis 
and head injury without fracture 

Lacerations and open wounds 

Burns 1 

Other and unspecified effects of external 
causes 7 6 


PMNS TUE RAIE SNARE hy oer . 














1 See reference [9]. Groups used are from “Drafts of Five Special Condensations and Expansions 
of the International Statistical Classification of Diseases, Injuries, and Causes of Death to Provide for 
Presentation in Convenient Form of Statistics of Sickness Surveys, Sickness Absenteeism, and Hospital 
Diagnosis,” received from I. M. Moriyama, Secretary, U. 8. Committee of Vital and Health Statistics, 
with letter dated 1-6-53. 

2 Categories not included in which there were no reported periods of hospitalisation. 

* Excludes 7 cases for which diagnoses were not available. 


NET PR tie ee mee 





pitalizations by month of admission, length of stay, and diagnosis were 
similar to the distributions obtained from hospital records. Whether or 
not surgery was performed was reported accurately in the household 
survey, but the description of surgical procedure was not as precise as 
that obtained from hospital records. 


RNAI MA al SITING HE ARENT NG 
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This study shows that reports of hospitalization obtained in house- 
hold sample surveys are sufficiently accurate to be used for many pur- 
poses in lieu of hospital record data. 
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BUSINESS FAILURES: ANOTHER EXAMPLE OF THE 
ANALYSIS OF FAILURE DATA 


K. 8. Lomax 
University of Manchester 


The analyses of failure data given by Davis [1] all involve 
essentially constant or increasing conditional probabilities 
of failure. For business failures, however, it is reasonable to 
expect monotonically decreasing conditional probabilities. 
An analysis of data on failures of four types of business in 
Poughkeepsie, New York, from 1844 to 1926 [2] confirms this 
expectation. The conditional probabilities of failure for these 
four series are well described by both exponential and hyper- 
bolic functions. 


HE interesting and stimulating paper by D. J. Davis [1] on failure 
data is most suggestive to the economist. 
Broadly, Davis analyzes three types of failure theory: 


(a) The normal theory of failure, in which the failure probability 
density function is Gaussian. 

(b) Human mortality, characterized by rapid increase of the con- 
ditional density function after middle-age. 

(c) Exponential theory of failure, in which the conditional density 
function is constant. 


In (a) uniformly and in (b) after the very early years of life the con- 
ditional density function of failure probability with time is strictly 
monotonic increasing. In (c) it is constant. 

The economist immediately thinks of business failures in which itis 
reasonable to expect the conditional density function strictly to de- 
crease monotonically. The purpose of this note, then, is to draw atten- 
tion to a fourth category of failure theory: 


(d) Business mortality. It is fairly well established that with most 
types of business the early years are the most difficult. It is then 
that mortality is highest. The longer a business survives, gener- 
ally, other things being equal, the smaller becomes the prob- 
ability of failure. 


Take, for example, the useful and comprehensive data compiled by 
R. G. and A. R. Hutchinson and Mabel Newcomer [2]. Their Table I, 
showing the length of life for business enterprises established in Pough- 
keepsie between 1844 and 1926, can serve as basis for calculation of 
F(t), for different values of ¢, where 


F(é) =cumulative probability of failure in the interval (0, ¢). 
847 
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The results of these calculations, omitting wholesale businesses since 
the sample there was small in comparison with the other get, are 
shown in Table 1. 


TABLE 1 


CUMULATIVE PROBABILITY OF BUSINESS FAILURE 
IN POUGHKEEPSIE, 1844-1926 








Age i " 
ge in Retail Manu 


years facture Craft Service 





0.296 0.231 0.307 0.327 
0.438 0.346 0.454 0.457 
0.532 0.469 0.551 0.551 
0.594 0.547 0.607 0.618 
0.643 0.602 0.660 0.669 
0.684 0.655 0.697 0.708 
0.715 0.678 0.727 0.743 
0.741 0.702 0.753 0.769 
0.762 0.726 0.772 0.792 
0.782 0.746 0.791 0.812 


Ce On Oo Fr WN eH 


a 

















Source: Calculated from Hutchinson, Hutchinson, and Newcomer [2]. 


In all these cases the graph of F(#) takes the shape shown in Figure 1. 








Now, if 


f(t) =probability density function of failure time 
= probability of failure in infinitesimal interval (¢, +d), 
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then f(t) =F’(t) and two methods are available for estimation of f(é) 
from the above data. One is to measure the slopes of the F(#) graphs 
at the different values of ¢. The other is to use the difference formula 


F’(t) = AF(t) — 44°F (0) + 44°F) —---. 
The former method seems to be the more satisfactory here, and the 
results of applying it to the data of Table 1 are shown in Table 2. 


TABLE 2 


PROBABILITY DENSITY OF BUSINESS FAILURE IN 
POUGHKEEPSIE, 1844-1926 








Aas of 5 
siti Retail Manu 


years facture Craft 





0.57 . 5 

0.175 ‘ .213 
0.107 > 0.102 
0.071 : 0.072 
0.056 : 0.056 
0.045 z 0.044 
0.036 : 0.034 
0.028 : 0.028 
0.023 d 0.024 
0.020 ‘ 0.020 
0.018 : 0.017 














0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
0 


1 





Source: Computed from Table 1. 


Thus, f() can generally be represented as in Figure 2. 
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From the values of f(t) and F(t), following Davis, we calculate Z(i) 
the conditional density function of failure probability with time, in 
other words, the instantaneous probability rate of failure at time ¢ 
conditional upon non-failure prior to ¢: 


t 
Z(t) = <a 
1 — F(t) 
These results are shown in Table 3. 


TABLE 3 
CONDITIONAL PROBABILITY DENSITY OF BUSINESS 
FAILURE, POUGHKEEPSIE, 1844-1926 








Age in Manu- 


Retail 


years facture Craft 





on 


0.57 0.365 0.5 

0.249 0.198 0.307 
0.190 0.182 0.187 
0.152 0.169 0.160 
0.138 0.148 0.142 
0.126 0.128 0.129 
0.114 0.107 0.112 
0.098 0.093 0.103 
0.089 0.077 0.097 
0.084 0.051 0.088 
0.083 0.028 0.081 





ceooooocococo 


0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
0 


1 














Source: Calculated from Tables 1 and 2. 


Z(t) is of the form shown in Figure 3. 
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There is really little purpose to be served by searching for analytical 
expressions representing this behavior. This could only be useful if it 
were feasible to obtain general support from extraneous sources for 4 
particular form of relationship. The only such support in this case is 
in relation to such trivialities as 


F(t), f®, Z>1, 
F(0)=0; f(0) = 2(0), 
F(t) monotonic increasing. 


It is, however, of interest to record that a good fit to the Z(t) values 
can be obtained, in each case, either by the exponential function 


Z(t) = ae~* 
or the hyperbola 


alu? =e 


The latter appears to be the more appropriate for the Retail, Craft, 
and Service groups, while in the case of Manufacturing trades the ex- 
ponential gives the better fit. These above functions were fitted to the 


data in the transformations 
log. Z(t) = log. a — bt, linear in ¢ and log Z(t) 


and 


linear intand ——- 
Z(t) 


The correlation coefficients corresponding to these linear forms are 
shown in Table 4. 


TABLE 4 


CORRELATION COEFFICIENTS FOR FUNCTIONS 
FITTED TO DATA OF TABLE 3 








Type of business Exponential Hyperbola 





Retail : 0.99 
Manufacture . 0.83 
Craft d 0.99 
Service : 0.98 


Bins 











One advantage of the hyperbolic “law” is that the expressions for 
S(t) and F(é) remain fairly simple. 
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If 
b 
Z(t) = meee 
then 
a 6 
F(t) =1 -(—) 
and 
b a \ot 
KO = 3G + J ' 
whereas 
Z(t) = ae 
leads to 
F(t) = 1 — et/%e—-») 
and 


f(t) = ae-btte/d(e—-1) 


Both alternatives conform to the desirable boundary conditions and 
monotonic behavior exhibited by the data. 

I hope shortly to carry out a more detailed analysis of business mor- 
tality covering British as well as American data. A cursory examination 
has already indicated that British experience does not always com- 
pletely accord with American. 
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CYCLICAL FLUCTUATIONS IN FOUNDRY ACTIVITY 


Luioyp SAVILLE 
Duke University 


I. INTRODUCTION 


HE extreme sensitivity of foundry operations to business change 

has been apparent for many years [7]. Only recently, however, has 
sufficient information been available to permit an analysis of the cycli- 
cal movements of foundry output as a whole. An adequate accumula- 
tion of data in a number of the Facts for Industry Series of the Bureau 
of the Census, some of it collected for the first time during World War 
II, now makes possible the construction of a seasonally adjusted index 
of foundry activity covering a period of business fluctuations. 

In general, foundries produce metal parts or castings to the custom 
specification of local firms in durable goods industries; consequently 
foundry activity is influenced by cyclical fluctuations in a wide range 
of geographic and industrial areas. Five general characteristics of the 
industry affect its sensitiveness to change: (1) The production of found- 
ries is dominated by changes in the demand for a variety of products 
commonly classified as durable consumers’ goods, investment goods, 
and war materials: castings are employed as bases for pumps, lathes, 
and presses; as frames for pianos, lawnmowers, and locomotives; as 
wheels for railroad cars, airplanes, and machines; and as component 
parts for lamps, engines, and motors. (2) Castings are made by several 
thousand establishments operating in many geographically separated 
markets; under these atomistic conditions, foundry production de- 
scribes the activities of a wide range of firms and tends to minimize 
the influence of one or a few firms on production totals. (3) The pro- 
duction of castings made from different metals is dominated by various 
technical requirements: use of aluminum castings is regulated by the 
level of output of such objects as airplanes and portable tools, where 
lightness of weight is a major consideration; the quantity of castings 
made from brass and bronze, malleable iron, steel, and gray iron re- 
flects, respectively, the rate of manufacture of corrosion-resistant fit- 
tings needed on ships and in chemical plants, shock-withstanding parts 
for railroads, armor plates for tanks, and general components for pro- 
ducers’ and consumers’ goods. (4) Since inventories of rough castings 
tend to be small, the production of castings is closely related to cur- 
rent demand. Foundrymen usually make parts to the specification of 
the individual consumer and so find it difficult to accumulate castings 


853 





854 


for future orders. (5) Finally, the makers of castings form not only a 
variety of items but also a sizeable product. In 1947 their value added 
to product totaled almost two billion dollars, or approximately 2.5 per 
cent of the value added by the producers of all the manufactured goods 
in the United States (Table 1). 


AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 10954 


TABLE 1 


e 
ESTIMATES OF ACTUAL AND RELATIVE VOLUME OF 
PRODUCT, VALUE OF PRODUCT, AND VALUE ADDED 

TO PRODUCT BY CERTAIN CLASSES OF FOUND- 
RIES AND ALL MANUFACTURERS IN THE 

UNITED STATES, 1947 








Group of 
manufacturers 


Actual figures 


Percentages 





Quantity 
of product 
in pounds 

(000,000) 


Value of 
product in 
dollars 
(000,000) 


Value 
added to 
product in 

dollars 
(000,000) 


Quantity 
of product 


Value of 
product 


Value 
added to 
product 





Aluminum and alumi- 
num base foundries 

Copper and copper 
base foundries 

Malleable-iron foun- 
dries 

Steel foundries 

Gray-iron foundries 


468 
1,061 
1,797 


2,532 
22,271 


256 
394 
231 


389 
1,773 


135 
207 
152 


252 
1,108 





Total of all foundries 


28,129 


3,043 


1,854 





Total of all manu- 
facturers 


not 
available 


not 
available 


74,426 























Source: [12, Vol. II, pp. 21, 535, 546, 560, and 564]. Data adjusted to make coverage comparable 
with quantities presented in Table 3. 


2. AVAILABLE FOUNDRY INFORMATION 


Almost all of the monthly information concerning the operations of 
foundries in the United States is found in the Current Statistical Service 
and the Facts for Industry series of the Bureau of the Census. Data for 
brief periods and for special groups of producers have been collected 
by other governmental agencies (e.g., Office of Price Administration) 
and trade associations (e.g., Malleable Founders’ Society). Table 2 
lists the dates and designations of publications of the Bureau of the 
Census in which monthly reports of foundry activity are available. 
Series dating from 1923 and 1926 are shown for malleable iron and steel, 
but only from 1942 and 1943 for castings made from aluminum, copper, 
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and gray iron. Consideration has not been given to castings of mag- 
nesium and lead for they are of minor importance, comprising in total 
about 2 per cent (by weight) of all nonferrous castings shipped in 
1953. Although castings made from zinc approximate in volume 
castings made from aluminum, they have been excluded from the 
inquiry for two reasons: (1) Comparable data concerning them is not 
available prior to 1946. (2) And, more important, the technology and 
market structure associated with die castings, the predominant way 
of forming zinc, is quite different from the techniques associated with 
the sand and permanent molding methods used in making castings of 
other metals. 


TABLE 2 


BUREAU OF THE CENSUS PUBLICATIONS OF MONTHLY 
FOUNDRY OPERATING INFORMATION IN THE 
UNITED STATES, 1923 TO 1954 








Designations 
of current 
publications 


Start of Publications in which data are included 
series (Dates are inclusive.) 





January, 1942 | Series 1-1, 1-3, 1-6, and 1-7, Jan. 1942 to Sept. | M24E 
1945; Series M24B, Oct. 1945 to Dec. 1945; Se- 
ries M24E, Jan. 1946 to present. 


Copper January, 1942 | Series 1-1, 1-3, 1-6, and 1-7, Jan. 1942 to Dec. | M24E 
1945; Series M24E, Jan. 1946 to present 


Malleable May, 1923 Current Statistical Service, May, 1923 to June, | M21-1 and M21C 

Iron 1944; Series 30-7, Jan. 1943 to Sept. 1945; Series 
M21B, Oct. 1945 to Dec. 1950; Series M21-1 and 
M2I1C, Jan. 1950 to present. 


January, 1926 | Current Statistical Service, Jan. 1926 to June, | M21-1 and M21C 
1944; (Production of Commercial Steel Castings, 
only) Series 30-1, July, 1943 to Sept. 1945; Series 
M22A, Oct. 1945 to Dec. 1950; Series M21-1 and 
M2I1C, Jan. 1951 to present. 


Gray Iron January, 1943 | Series 30-2, Jan. 1943 to June 1944; Series 30-5, | M21-1 and M21C 
July, 1944 to Sept. 1945; Series M21A, Oct. 1945 
to Dec. 1950; and Series M21-1 and M2I1C, Jan. 
1951 to present. (Miscellaneous castings series 
first available Oct. 1944) 














Currently, only unfilled-order and shipment data are being pub- 
lished; shipment figures are the more descriptive of the two. Unfilled 
orders tend to fluctuate not only with the demand for castings, but 
also with the availability of casting facilities, for customers often place 
duplicate orders with a number of foundries during boom times in 
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the hope of achieving prompt delivery from one of them. This com- 
plexity, and the fact that figures are not available prior to December, 
1945 for malleable iron and steel castings, reduces the value of unfilled 
orders as a measure of cyclical change. Since inventories of finished 
castings held by foundries usually are not large, shipment data corre- 
spond very closely to production information; when both production 
and shipment figures are available concurrently, no material differences 
are evident. 

Shipment series for the five foundry industries, inflated to achieve 
full coverage and comparability, are shown in Table 3. The universe 
of firms has been ascertained with some precision, for the reports sub- 
mitted by each company during World War II were used in connection 
with the allocation of scarce materials; even establishments which 
may have dealt in the black market probably complied with the filing 
requirements of the regulations in order to secure a legitimate quota 
of metal. 

The coverage of monthly releases varies widely; in the case of mal- 
leable-iron foundries, virtually every producer reported every month 
during the last ten years, while in the case of each of the other foundry 
industries, reports were collected from all known producers in 1946 and 
1950. Between these complete enumerations, samples of varying size 
were assembled each month. The Census Bureau revised the monthly 
reports for 1945 and 1946 on the basis of the annual reports collected 
from the universe of firms in 1946, and revised the monthly reports 
for 1948 and 1950 on the basis of the 1950 study. Similar revisions 
have been made in the monthly data for 1947 and 1948 by the author 
based on adjustments of annual totals for these years published by the 
Census Bureau. A new sample of the nonferrous firms, makers of alumi- 
num and copper castings, was established in September, 1952; the 
monthly totals for 1951 and 1952 have been revised on this basis. 

With the exception of the gray-iron industry the data reflect activity 
of all of the firms in each of these industries. Shipments of gray iron 
comprise only three of the five groups of producers often classified 
in the broad category, gray-iron foundries. Miscellaneous gray-iron 
castings, molds for heavy steel ingots, and chilled iron railroad car 
wheels are included in the series because the techniques involved in 
their manufacture and sale are generally similar to the processes of 
other foundries studied here; cast iron pressure pipe and fittings and 
cast iron soil pipe and fittings are excluded for they are standardized- 
inventory items which are made and marketed quite differently from 
the custom-made products included in the study. 


TABLE 3 
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108.60 
143.00 
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MONTHLY SHIPMENTS OF ALUMINUM, COPPER, MALLEABLE-IRON, STEEL, AND GRAY-IRON 
CASTINGS IN THE UNITED STATES, UNADJUSTED FOR SEASONAL FLUCTUATIONS 
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3. SEASONAL ADJUSTMENTS 


The data in Table 3 show a seasonal pattern. Relatively low rates 
of production during summer months, even during the war years, 
reflect the difficulty of obtaining high productivity in foundries during 
warm weather. These recurrent variations in output have been re- 
moved in two steps; each series has been adjusted for the number of 
working days in the month and for seasonal fluctuations. 


3.1. Calendar Factors 


The working schedule of foundries varied widely during this period; 
in general, it involved a single shift, a five-day week, and the observ- 
ance of six holidays. No modification was made in the figures for vari- 
ations in number of shifts; even during the war years substantially 
less than one-half of the foundries worked more than one shift [14, p. 
17]. Further, adjustments for shift operations made in response to 
changing business conditions, even if they were possible, would tend to 
reduce the sensitiveness of the index to cyclical movements. 

The first adjustment made in the figures is for changes in the num- 
ber of days worked per month. In general, foundries operated five 
days per week throughout the period. No corrections were made for 
variations in working schedules which occurred as a result of changing 
business conditions in the early thirties or in the 1949 recession. Such 
corrections tend to make the index less sensitive to cyclical fluctuations, 
because alterations in the usual work week are, themselves, measures 
of business change. Moreover, adjustments of working schedules dif- 
fered too radically from foundry industry to foundry industry and 
from area to area to render frequent modification valid. Exceptions to 
this are the period from 1923 to 1931, when the five and one-half day 
week was the custom and the period of World War II from January, 
1941 to August, 1945, when foundries usually operated six days per 
week [13, pp. 54-55]. 

The second adjustment is for changes in the number of holidays. 
Prior to 1942, five holidays were observed: New Year’s Day, Inde- 
pendence Day, Labor Day, Thanksgiving Day, and Christmas Day. 
During 1942, 1943, and 1944 only New Year’s Day, Independence 
Day, and Christmas Day were celebrated; in 1945 Thanksgiving Day 
was added; and, in 1946 and later years, Memorial Day and Labor 
Day were included. On the basis of these work weeks and holidays, 
calendar adjustments in the monthly figures were made in order to 
remove the influence of variations in the days worked during the month. 
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3.2. Seasonal Factors 


Pronounced seasonal changes arise from natural working conditions 
of the plants and fluctuating demands for their products. Since con- 
current information is available for all metals from January, 1943, the 
ten-year period subsequent to this date was selected for a provisional 
computation of seasonal factors. The twelve-month-moving-average 
method of adjustment was employed, the resulting indexes were com- 
puted by use of the modified-mean technique whereby the extreme 
values found for each month were excluded from the calculation of the 
mean.! This method was useful in removing the influence of such inci- 
dental variations as the steel strike of July, 1952 from the final indexes. 

A second series of seasonal indexes was computed, based this time on 
the six post-World-War II years of 1947 through 1952. These factors 
(Table 4) show a more pronounced pattern of seasonal trend than those 
based on a ten-year period including the war years of 1943, 1944, and 
1945, and the post-war readjustment year, 1946. This is consistent with 
experience in other industries reported by the Federal Reserve Board 
[3, p. 1263]. 

The ferrous series, malleable iron, steel, and gray iron, have higher 
seasonal peaks in the spring than in the fall; in this they follow the 
pattern of durable manufactures in general. On the other hand, the 
nonferrous indexes, aluminum and copper, behave similarly to the 
producers of nondurable goods and reach their annual production 
heights in the fall.? This contrasting situation may be explained in 
part by the differing utilization of ferrous and nonferrous castings. 
Ferrous castings are used extensively in heavy construction and trans- 
portation machinery, industries showing high rates of activity in the 
spring; nonferrous castings are employed in aircraft, instruments, 
hardware, and light tools, industries having little seasonal fluctuations 
or high production levels in the fall [3, pp. 1292-3]. 


4. MOVEMENTS IN THE SERIES 


After adjustment for calendar and seasonal fluctuations, the foundry 
series exhibit broadly similar patterns of output change as they move 
from phase to phase of the business cycle. Variations are still present, 
however, in the individual responses of the industries to different in- 
dustrial situations. 





1 This method was utilized because fairly reliable information"could be obtained from only a lim- 
ited amount of material. Had information covering a longer period been available, the technique used 
by the Board of Governors of the Federal Reserve System would have been employed [1]. 

2 In the newly revised Index of Industrial Production a single set of seasonal adjustment factors is 
applied to all primary metals [3, p. 1264]. 
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4.1. Relationships Among Foundry Series, 1943-1954 


The concurrent information available for all series shown in Figure 
1, has been adjusted for seasonal fluctuations (Table 4) and stated 
comparably in pounds on an average-daily-shipments base. Most of 
the fluctuations can be identified with nationally-publicized events. 
Variations in the series during 1943 and 1944 resulted from two causes: 


TABLE 4 


SEASONAL ADJUSTMENT FACTORS OF SHIPMENT DATA OF 
SELECTED CASTINGS INDUSTRIES IN THE UNITED 
STATES, BASED ON 12-MONTH MOVING AVERAGE 
ADJUSTMENTS FOR 6-YEAR PERIOD, JANUARY, 

1947, TO DECEMBER, 1952, INCLUSIVE 








Month Aluminum | Copper Malleable Steel Gray Iron 





January 99 101 101 98 103 
February 105 104 104 107 103 
March 103 103 106 105 
April 104 105 103 105 
May 101 104 102 
June 97 101 104 
July 84 82 
August 91 92 90 
September 104 102 
October 101 100 
November 102 102 
December 99 104 




















Source: Table 3 and accompanying text. 


(1) Adjustments were made to the data for all years on the basis of 
seasonal factors computed from the years 1947 to 1952, inclusive, a 
period in which seasonal variations were larger than during the war. 
The use of the same seasonal factors throughout stresses the mechanical 
character of the foundry index and the changes in seasonals which 
have occurred under specific conditions in the past. At the same time, 
these factors distort the adjusted series during 1943 and 1944 by intro- 
ducing counter-seasonal variations. (2) Because controls were handled 
on a calendar-quarter basis by the War Production Board, they seem 
to have imparted some special quarterly fluctuations to the output of 
foundries. Apparently, more optimistic estimates at the beginning of 
a quarter resulted in larger allocations during the first month than in 
subsequent months, when more limited stocks appeared to be avail- 
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able; more pessimistic estimates tended to produce the reverse effect 
[13, p. 51). 

In later years reconversion, strikes, and recessions were important 
influences. All of the series dropped in August, 1945, as the result of 
V-J Day; the more radical decline and slower recovery of the alumi- 
num, copper, and steel series indicate a greater dependence on war 
work, especially in the case of aluminum. Except for steel, all industries 
declined in November, 1948 at the start of the 1949 recession. The lag 
of one month by steel may be accounted for by the dislocation of pro- 
duction schedules by strikes earlier in the year. General consistency of 
timing is notable also in the recovery from the 1949 recession; in June 
an upturn occurred in each series. The impact of the long steel strike 
during the fall makes it difficult to determine whether recovery would 
have been continuous or faltering had this interruption not taken place. 
Strikes in the steel industry usually have an immediate and pro- 
nounced effect on all of the ferrous foundries; strikes in associated in- 
dustries such as coal and carborundum seem to have confined their 
effects to steel castings. Similarities in timing are evident also in the 
slight advances in each series following the outbreak of the Korean 
War and the downturns in each series at the beginning of the 1954 
recession in August, 1953. 


4.2. Relationships Among Foundry Series, 1929-1947 


Since no monthly information concerning three of the series is avail- 
able for the cyclically important pre-war period, less satisfactory 
annual data are employed to assess the magnitude of their reactions to 
severe depression and recovery. Production indexes based on figures 
from various Census of Manufactures [11] are shown in Table 5. In 
general, the industry coverages represented by these figures are similar 
to, but not identical with, the coverages involved in the monthly 
series. 

Broadly speaking, the conclusions reached earlier concerning the 
behavior of the casting series are reinforced by this information. In 
general, the production changes of all foundry industries show wide 
cyclical movements; again, individual dissimilarities are apparent. In 
1937, recoveries from depression lows were much more complete in the 
ferrous than in the nonferrous series, but in 1939, however, output 
declined in all fields except aluminum where the beginnings of expanded 
aircraft production were reflected. 
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5. COMPARISON WITH INDEXES OF INDUSTRIAL PRODUCTION 


In Figure 2 the longer record of operations in the malleable iron 
and steel fields is compared with the general Index of Industrial Pro- 
duction and the component Index of Durable Manufactures. This per- 


TABLE 5 


INDEXES OF PHYSICAL PRODUCTION: ALUMINUM, COPPER, 
MALLEABLE-IRON, STEEL, AND GRAY-IRON CASTINGS 
IN THE UNITED STATES FOR SELECTED 
YEARS, 1929 TO 1939, INCLUSIVE 
1929 = 100 








Metal 





Malleable 


‘ao Steel 


Aluminum Copper 





1929 100 100 100 100 
1931 48 51 41 33 
1933 33 18 36 21 
1935 58 24 59 37 
1937 56 58 90 86 
1939 79 43 69 50 




















Source: [11: 1981, pp. 839, 880, 908, and 992; 1936, p. 1079; 1987, Pt. I, pp. 936, 1023, and 1061; 
1939, Vol. II, Pt. 2, pp. 198, 202, 205, 342, and 347]. 


mits an examination in detail of changes in output of at least two of 
the foundry industries during a period of wide cyclical movements, 
information not obtainable directly from the limited historical series 
available for aluminum, copper, and gray-iron. 


§.1. Foundry Data More Volatile 


Three differences between the foundry series and the indexes of the 
Federal Reserve Board account for most of the contrasting reactions: 
(1) When compared with the larger and broader indexes the foundry 
information is, in a sense, a small sample of business activity, as such 
it tends to display the well known larger sampling errors of all small 
samples. (2) The casting series are computed more mechanically than 
the Federal Reserve Board’s indexes. With foundries a simple proce- 
dure has been followed by applying the same corrections and adjust- 
ments throughout the period; for example, seasonal factors computed 
from the years 1947 to 1952 have been used from the beginning to the 
end of the period. In the Federal Reserve Board’s indexes seasonal 
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adjustments are modified continually and estimates are made before 
all returns are received; these practices tend to introduce subjective 
evaluations into the preliminary estimates which may be carried over 
to some extent in the final or revised figures. (3). In the foundry series 
the utilization throughout the period of a single set of seasonal adjust- 
ment factors, based on the 1947-1952 period, underadjusts some im- 
portant seasonal movements prior to World War II. This is especially 
notable in the malleable-iron data, for the seasonal requirements of 
the manufacturers of automobiles, railroad equipment, and heavy con- 
struction machinery were much greater in the 1920’s than they are now 
(3, pp. 1292-3 and 2, pp. 2-4]. 


§.2. Foundry Data Show Little Secular Trend 


The general level of foundry activity has changed only a small 
amount in the past thirty years. The shipments of steel castings in 
1929 were exceeded by less than 8 per cent in the boom years of World 
War II; the output of malleable iron expanded only about one-third 
between 1929 and the Korean War. On the other hand, the Index of 
Industrial Production showed an increase of more than 100 per cent, 
the Index of Durable Manufacturers, 150 per cent. This characteristic 
of foundry data may be explained by the persistent shift from castings 
to other methods of forming metal parts, such as stamping, forging, 
and welding. 


§.3. Foundry Information More Sensitive to Cyclical Change 


The large utilization of castings for capital improvements is reflected 
in the amplitude of changes in ovtput levels during business fluctua- 
tions. In fact, the relative volatility of each of these series seems to 
vary with the importance of capital goods in the index. Thus, the 
foundry series are more sensitive than the Durable Index, while the 
latter, in turn, is more variable than Industrial Production Index. 

As the period advanced, the proportionate contractions from peak 
to trough (1929-1932, 1937-1938, and 1948-1949) became progres- 
sively greater in the foundry series than in the Federal Reserve indexes; 
thus, the declines for malleable iron and steel from 1948 to 1949 were 
40 per cent and 58 per cent, respectively, while they were only 10 per 
cent and 17 per cent for Industrial Production and Durable Manu- 
factures. This development may be explained by the new system of 
weights being employed by the Federal Reserve Board; instead of 
basing the importance of an industry on the gross value of its product, 
its influence is determined by the value added to its product. This. of 
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course, decreases the significance of industries in which the value added 
to the product is small in proportion to the total value of the product, 
and increases the emphasis of industries with a high value-added ratio. 
The importance of the change in weights here is that it now renders the 
Federal Reserve indexes less sensitive to cyclical change than they 
were before the revision, since items with high value-added ratios such 
as tools, instruments, and machinery are generally less sensitive to 
business change than goods with low value-added ratios such as coal, 
lumber, and blast furnace products.* [Cf. 3, p. 1243 and 1277]. 


5.4. Foundry Information Changes in Phase With Other Indexes 


In the foundry series it is difficult to distinguish the actual peaks 
and troughs of business activity from the residual seasonal fluctuations 
and the incidental changes resulting from the small character of the 
sample. Apart from these problems, once cyclical changes are located 
in the Index of Industrial Production, similar movements are apparent 
in the malleable and steel series. Since the Index exhibits a slight lead 
at peaks and a rough coincidence at troughs [9, p. 60], it may be as- 
sumed that the foundry series do not depart markedly from this timing. 


6. AN INDEX OF FOUNDRY ACTIVITY 


Although foundry series generally move at the same time and in the 
same direction, they do exhibit individual variations that tend to dis- 
tract attention from the more fundamental responses all foundries 
make to economic change. An effective way of overcoming this diffi- 
culty is to combine the five casting series into a composite index of 
foundry activity. 


6.1. Alternative Weighting of Foundry Index 


Production relatives with a base 1947-49 were computed for each 
foundry series shown in Table 3 after the data had been adjusted for 
seasonal variations (Table 4). The resulting schedules were combined 
into four composite indexes of foundry activity; the first was weighted 
equally, the second by physical volume of shipments, the third by 
dollar volume of shipments, and the fourth by value added to product. 
The weights were derived from the data in Table 1. 

Each of these devices has certain unique advantages: (1) By the 
system of equal weights the composite index reflects changes in each 





3 Ex. the low point reached in the fall of 1949 by the Metal Fabricating component of the Index of 
Industrial Production was only 80, while the comparable point for the Primary Metals component was 
46 [3, p. 1298]. 
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of the five industries without regard to the physical or monetary im- 
portance of the castings shipped. (2) The method of weighting the 
series according to the number of pounds of castings shipped by each 
industry places greatest emphasis on the gray-iron group. These found- 
ries were found to exhibit greater production stability than the other 
series because their castings are used more extensively in consumers’ 
products than they are in war or defense lines. 

(3) The technique of weighting the series according to the value of 
the castings produced by each industry results in giving less importance 
to the gray-iron group and somewhat more to the costly nonferrous 
castings. It imparts emphasis to aircraft and marine castings, promi- 
nent in defense and war activity. (4) The use of value added data as a 
basis of weights is consistent with the weighting system used in the 
Indexes of Industrial Production and Durable Manufactures. In effect, 
this weighting reduces the relative importance of nonferrous castings 
by eliminating the costly metal from the figures. Actually, the index 
based on value added to product is almost identical with the one based 
on total value of product. 


6.2. Selection of the Most Appropriate Weights 


It is not enough to come to a logical conclusion in the selection of 
weights; it is necessary, also, to see how well the index employing the 


weights behaves in practice. Two sets of data are available: (1) biennial 
figures for the period, 1929-1939, and (2) annual information for the 
period, 1943-1953. 

6.2.1. The pre-war period. Shown in Table 6 are weighted indexes of 
foundry activity derived from the Census of Manufactures data for 
the years 1929 through 1939. The relatives, on a 1929 base (Table 5), 
have been combined and weighted by the importance of each metal 
in 1947 (Table 1) to form indexes of foundry activity similar to those 
described above. For comparison, five related indexes of general eco- 
nomic activity are presented: (1) Private Investment, a narrow meas- 
ure which includes: construction of new plants by business, purchases 
of producers’ durable equipment, and changes in business inventories. 
It is, in effect, a measure of domestic, non-government investment. 
(2) Total Investment, a broader series which adds to (1) above: net 
foreign investment (current balance of payments) and government 
purchases of goods and services. This is consistent with the usual defi- 
nition of investment. (3) Durable Manufactures and Industrial Pro- 
duction, current Federal Reserve indexes adjusted to the 1929 base. 
(4) Gross National Product, a measure of the total economic activity 











870 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1954 


in the United States. This series and the two investment ones were 
computed from data stated in constant (1939) dollars [15]. 

A comparison of the related indexes in Table 6 reveals a general 
similarity of behavior, with two important exceptions: (1) Private In- 
vestment, Total Investment, and Durable Manufactures reacted more 
violently to the depression than the other two series; this observation 
is consistent with the notion that investment and durable goods ac- 
tivity change more drastically than general business operations during 


TABLE 6 


INDEXES OF FOUNDRY ACTIVITY WEIGHTED VARIOUSLY WITH 
1947 FACTORS COMPARED WITH iNDEXES OF RELATED 
ACTIVITIES IN THE UNITED STATES FOR SE- 

LECTED YEARS 1929 TO 1939, INCLUSIVE 














(1929 = 100) 
Foundry Indexes Weighted Related Indexes 

hysi- Ind 

Year 5 ~y Dollar | yalue | Private | Total | Durable aa Gross 
Equally Vol- Vol- added Invest- | Invest- | Manu- Produc- National 

ome ume ment ment factures den Product 
1929 100 100 100 100 100 100 100 100 100 
1931 44 46 45 45 40 46 52 68 84 
1933 29 35 32 33 1l 31 40 63 72 
1935 47 53 49 50 45 50 63 80 86 
1937 77 91 85 86 77 68 92 103 102 
1939 64 74 70 70 66 71 82 98 106 
































Source: Tables 1 and 5; [15, pp. 26-27]; and [3, pp. 1824 and 1326). 


depressions. (2) From 1937 to 1939 Total Investment and Gross 
National Product increased while the other series decreased; evidently 
spending by the Federal government, which increased Total Invest- 
ment and was reflected in a larger Gross National Product, did not 
carry over to Private Investment, Durable Manufactures, or Industrial 
Production sufficiently to expand the series. Fluctuations in each of 
the foundry indexes follow movements in Private Investment. 

6.2.2. The war and post-war period. In Table 7 indexes similar to those 
shown in Table 6 are presented for a later period and computed to a 
later base, 1947-1949. The foundry indexes are derived from the 
monthly values in Table 3; the related indexes are from the national 
income‘ and Federal Reserve data used in computing the comparable 
series in Table 6. 





4 In this period, quarterly national income information is available on a current dollar basis; it 
was not employed because the precision added by quarterly figures did not appear to be as great as 
inherent inflationary variations which could not be completely removed from the data. 
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Substantially more diversity is evident in the movements of the re- 
lated series during this, than during the earlier period. The enormous 
expenditures by the government for war equipment, especially in 1943, 
1944, and 1945, account for a large portion of the differences. Con- 
flicting changes which occurred during the later years are, however, 
a logical outcome of the economic process. Castings are employed in 
the construction of plants and equipment for the manufacture of pro- 


TABLE 7 


INDEXES OF FOUNDRY ACTIVITY WEIGHTED VARIOUSLY WITH 
1947 FACTORS COMPARED WITH INDEXES OF RELATED 
ACTIVITIES IN THE UNITED STATES, 

1943-1953, INCLUSIVE 
(1947-49 = 100) 




















Foundry Indexes Weighted Related Indexes 

Year — Dollar | yaiue | Private | Total | Durable — Gross 

Equally Vol- Vol- Added Invest- | Invest- Manu- Produc National 

ome ume ment ment factures peri Product 

1943 101 77 87 86 27 163 162 127 103 
1944 105 77 89 88 33 183 159 125 110 
1945 89 75 81 80 42 162 123 107 108 
1946 93 83 88 87 102 103 86 90 97 
1947 106 100 102 102 96 97 101 100 98 
1948 109 102 104 104 114 105 104 104 101 
1949 82 82 82 82 90 98 95 97 101 
1950 110 104 107 106 134 114 116 112 110 
1951 124 117 120 119 138 141 128 120 118 
1952 110 100 103 103 122 146 136 124 121 
1953 119 106 111 110 153 134 


























Source: Tables 1 and 3; [15, pp. 26-27]; [3, pp. 1324 and 1326]; and Federal Reserve Bulletin, 
March, 1954, p. 295, for “Preliminary” 1953 values for Index of Industrial Production and Index of 
Durable Manufactures. 
ducers’ goods, consumers’ goods, and war goods. The outbreak of war 
in Korea caused an initial expansion of Private Investment and con- 
comitant foundry production to create and adapt new factories before 
large government orders could be filled. Thus, increases in government 
spending to swell Total Investment were preceded by the utilization 
of private investment funds by foundries and others in building or 
modernizing facilities with which to make the proper war goods.’ Con- 
trasting movements of Private and Total Investment from 1951 to 
1952 emphasize the curtailing of Private Investment (and foundry 
activity) after the production facilities for the new war were developed. 





5 Actually the Federal government's total purchases of goods and services were larger by $2.1 
billion (1939 dollars) in 1949 than they were in 1950 [15, p. 27]. 





872 AMERICAN STATISTICAL ASSOCIATION JOURNAL, DECEMBER 1954 


On the basis of these comparisons, weighting the foundry series by 
value-added figures seems to be most appropriate: (1) It gives each 
foundry industry and associated area an importance relative to the ex- 
penses of producing the castings without regard to the cost of the metal 
from which the castings are formed. (2) It tends to give less influence 
to nonferrous castings than the equally-weighted scheme and thus 
provides an index less heavily affected by occurrences in the aircraft 
and marine fields. (3) It gives less prominence to gray-iron castings and 
more to aluminum than the physically-weighted index and so tends to 
present a more balanced picture of the over-all investment economy. 
(4) It is, in essence, a workable average system of weighting which 
possesses the good points of all of the foundry series without the vul- 
nerability of either of the extreme weights. Values for the Index of 
Foundry Activity are shown in Tabie 8. 


7. EVALUATION OF THE INDEX 


The appraisal of the Index of Foundry Activity is facilitated by a 
comparison with the Indexes of Industrial Production, Durable Manu- 
factures, and Private Investment in Figure 3. Specific attributes are 
apparent. 


7.1. Timing of Movements Roughly Similar to Those in Index of Indus- 
trial Production 


Similarities of movement are evident between the Foundry Index 
and the Index of Industrial Production; since the latter has been found 
to exhibit some lead at peaks and general consistency at troughs [9, p. 
60], it may be reasoned that the Foundry Index, also, is timed in this 
general fashion. 


7.2. Magnitude of Movements Greater than Index of Industrial Produc- 
tion 


Over the period investigated, the size of the fluctuations in foundry 
activity has been substantially greater than changes in the Index of 
Industrial Production and Gross National Product. In other than war 
years, changes in the Foundry Index approximate in a rough way the 
amplitude of movements of private investment (Tables 6 and 7); in 
spite of the difficulty of comparing annual and monthly data in Figure 
3, the generalization is supported also by this illustration. 
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"a7 ‘4g ‘49 ' 50 ‘SI 5253 ‘54 
YEARS 








1943 ‘44 ‘45 | ‘46 


Fia. 3. Comparison of Index of Foundry Activity with Indexes of Industrial 
Production, Durable Manufactures, and Private Investment in the United 
States, 1943 to 1954, Inclusive, SEASONALLY ADJUSTED. (1947-49 = 100). 


Source: Foundry data from Table 3, seasonally sdjusted by factors from Table 4, and combined in 
a weighted-relative index using the value-added-to-product figures from Table 1. Table 8, and [3, pp. 


1324 and 1326). 
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TABLE 8 


INDEX OF FOUNDRY ACTIVITY IN THE UNITED STATES 
SEASONALLY ADJUSTED 
(1947-49 = 100) 








S 
2: 


Year Jan. June | July 





90 98 
87 95 


1943 78 80 79 
1944 94 90 89 


81 86 
86 98 
103 | 101 
103 | 107 
82 80 


1945 88 87 88 
1946 71 59 78 
1947 102 | 102 | 103 
1948 105 | 105 | 103 
1949 95 96 89 


RSERS BS 


1950 86 86 86 106 | 115 
1951 121 121 126 | 119 
1952 106 | 110 | 109 87 75 
1953 116 | 117 117 
1954 96 93 88 









































Source: Foundry data from Table 3, seasonally adjusted by factors from Table 4, and combined ina 
weighted-relative index using the value-added-to-product figures from Table 1. 


7.8. Determination of Index Values Highly Mechanical 


Index values may be obtained by applying a stereotyped set of 
procedures to the raw data. A disadvantage of this method is the trans- 
mitting of local, extraneous variations to the final index. Two alterna- 
tives are available; (1) identifying the special events by supplementary 
notes, and (2) editing the original information to remove the extraneous 
material. The first alternative has been selected here on the assumption 
that some incidental fluctuations and possible misinterpretations of 
the index are easier to comprehend and to correct than an unknown 
amount of editing. 


7.4. Availability of Data Relatively Prompt 


Foundry data in the Facts for Industry series are customarily issued 
about six or seven weeks after the end of the month for which informa- 
tion is published. This is about as soon as the Preliminary Index of 
Industrial Production is released by the Board of Governors in the 
mimeographed Business Indexes supplement to National Summary of 
Business Conditions. It is substantially earlier than private investment 
data collected by the Office of Business Economics of the Department 
of Commerce and the Securities and Exchange Commission are pub- 
lished in the Survey of Current Business [5, p. 9]. 
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CARGO LOSS IN FERRYING OPERATIONS 


Water L. Deemer, Jr. 
United States Air Force* 


1, INTRODUCTION 


N FERRYING operations of valuable items (e.g., aircraft spare parts 
| needed in a theater of operations during a war), the number of items 
to be carried in each aircraft is frequently under the control of the 
planner. By using more aircraft he can make the loading per aircraft 
smaller and hence possibly reduce the probability of large losses at the 
risk of increasing the probability of small losses. (The small loading 
also increases the actual ferrying cost.) Because of the value of the 
items it may be a good investment to buy this insurance. 

In making his decision as to how many items to load per aircraft, 
the planner therefore needs to know, as a function of the loading, the 
probability of losing any given number of items. Sometimes the mean 
and variance may give sufficient information, without knowledge of 
the more laboriously calculated probabilities. Usually however, the dis- 
tributions are not even approximately symmterical nor even unimodal, 
so that an adequate approximation using only the mean and variance 
is not possible. 

In this paper expressions are given for the mean and the variance of 
the number of items lost, for the probability of losing a given number 
of items and for the probability generating functions. The moment 
generating functions may be found from the probability generating 
functions by substituting e* for ¢. For models 1, 3, and 4 several for- 
mulas, in addition to the probability generating function and the gen- 
eral formula (m.4) are given for the probabilities of losing exactly a 
items.’ These special formulas are useful for certain values of a. Appar- 
ently this problem has not been previously considered in this form in 
the statistical or operational research literature. 

We denote by k the number of flights to be used in the ferrying opera- 
tion. No assumptions are required as to how many aircraft are used in 
the operations; there may be k aircraft, each making one trip, or fewer 
aircraft may be used, some or all making more than one trip. 

Each aircraft carries r valuable items, so that a total of N =kr items 
are ferried. Each item is assumed to be equipped with a life raft or 
some other means of preventing it from sinking in case the aircraft 





* The views expressed here are those of the author and are not to be construed as reporting official 
or unofficial policies of the United States Air Force. 
1 (m.z), (m=1, 2, 3, 4) represents formula z for Model m. These are exhibited in Table 1. 
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ditches. In the event that the aircraft ditches it is assumed that the 
items leave the aircraft and float. (If the objects are inanimate, it is 
assumed that they are jettisoned.) Some or all of the life rafts or 
flotation devices may be equipped with radio transmitters or radar 
reflectors to help the search. 

It is assumed that there is a constant probability p that an aircraft 
will be forced to ditch (i.e., make a forced landing on the water) before 
it has delivered its items. It is also assumed that each trip is independ- 
ent of all the other trips, so that the distribution of number of aircraft 
ditched is given by the expansion of (p+q)*, where g=1—p. 

When an aircraft is ditched, a search is made for the floating ob- 
jects.? Some of these may be recovered. The items not recovered will 
be referred to as lost. 

Four models are considered, each model being defined by a set of 
assumptions on the behavior of the floating items and the character- 
istics of the search operation. 


2. NOTATION 


k=number of carriers used 

r=number of items per carrier 
N =kr=total number of items carried 
p=probability that a carrier will ditch 


q=1—p 
t= probability that an individual item will be found 
s=1-—¢ 
f{=probability that a clump of lost items will be located 
g=l—f 
A(k, p, j) =CFpigh-i=[k!/j’(k—j) "pig? ', if O57 Sk. 
=0, if 7>k or j7<0. 
A(0, p, 0) =1 
E,,(X) =expected value of the random variable X under model m 
V(X) =variance of X under model m. 


8. DESCRIPTION OF THE MODELS 


Model 1. By the assumptions of this model the ditched items (jr in 
number if 7 aircraft are ditched) are distributed over a wide area so 
that the conditional distribution of number of items recovered, given 
that j aircraft have ditched, is given by the terms of (¢-+s)* where ¢ 

is the probability of recovering a single item; s=1—1. 





2 The cost of maintaining search facilities is not relevant to this problem because these are main- 
tained in any case for the rescue of crews. 
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Model 2. Here the items are assumed to float together (either be- 
cause they are tied together or because the wind and waves have not 
separated them) so that if one item is found they are all found. Then 
if f is the probability of finding a clump of r items, the conditional 
probability of finding ir items given j aircraft are ditched is the term 
in f‘ of the expansion of (f+g)’, where g=1—/f and as before 7 is the 
number of aircraft ditched. 

Model 3. Here the items do not clump to the extent they do in 
Model 2. We assume that there is a probability f of getting in the vicin- 
ity of the r semi-dispersed items and then having arrived in the vicinity 
of the items there is a probability ¢ of finding a single item. (One might 
get in the vicinity of a clump, for example, by finding the ditched air- 
craft or debris from it.) The conditional probability of getting in the 
vicinity of ¢ clumps given that j aircraft have been ditched is the term 
in f‘ of (f+g)?. For each such clump the probability of recovering a 
items is the term in ¢* of (¢+s)". 

Model 4. This is like Model 3 except it is assumed that the way one 
gets in the vicinity of a clump is actually to find an item. For example, 
one of the r items on each aircraft might have a special radio trans- 
mitter or a radar reflector. Here then the probability of finding the 
first item of a clump of r items is f. Having found the (first item, the 
probability of finding each successive item in that clump is ¢. 

General Comments on the Models. It is not claimed that the mathe- 
matical models here considered fit exactly any actual ferrying situation. 
For example, it is somewhat unlikely that any actual ferrying situation 
fits Model 1, though it might fit when the items are dropped individu- 
ally by parachute over a wide area. Some of the other models seem to 
be adequate representations of actual situations. 

Models 1 and 2 are special cases of Models 3 and 4 as follows: If f=1 
in Model 3 we have Model 1. If ¢=1 in Model 3 or Model 4 we have 
Model 2. 


4. RESULTS 


The results are given in Table 1. In the summation indices [(a — 1) /r] 
means the greatest integer less than or equal to (a—1)/r; for a=0, 
this is equal to —1. 


5. DISCUSSION OF RESULTS 


In Models 1, 2, and 3 the expected number of items lost is not a 
function of the loading, as evidenced by the fact that k and r appear 
only as the product kr=WN in the expressions for the expected number 
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TABLE 1 
SUMMARY OF RESULTS 











Model 1 


Expected number of items lost kprs 
Variance of number of items lost kprs(t+rsq) 
The probability generating function M1.(¢) =[p(t+s¢)"+q]* 
Formulas for probability of losing* 
exactly a = br +c items (0 Sc <r) Dd A(k, p, 7) AC, 8, a) 
(s*/a!) De(q+pt")* 
(useful for small a) 





Model 2 
Expected number of items lost kprg 
Variance of number of items lost kpr*g(1 —gp) 
The probability generating function M2(¢) =[p(f+g¢") +q]* 
Formula for probability of losing ex- 
actly a=br items peC*(q+pf)* 





Model 3 


Expected number of items lost kpr(1 —tf) 
Variance of number of items lost kpr [qr (1 —tf)? +¢f(s +rtg) ] 
The probability generating function M;(¢) = { p[f(t+s¢)"+g9¢"]+¢}* 
Formulas for probability of losing ex- i 
actly a=br-+c items (0 Sc <r)* id Atk, p, 3) AG, SF, 2) 
inj Al(ir, t, jr—a) 
s 
LD {Catgnp"s*-""/(a—nr)!} 
aul De-""(q +pft’) k—n 
(useful for small a) 
Li {Alk, p, jt-*/(Gr—a)!} 
D,i*-+ (g +fs")i 
(useful for a —kr small) 





Model 4 
Expected number of items lost kp[r —(s+rt)f] 
Variance of number of items lost kp { fts(r —1) +f9(s +rt)*+q 
[r —f(s +rt) ]*} 
The probability generating function M.(¢) = { p[f(t+s¢)"-1+9¢"]+q}* 
Formulas for probability of losingex- >; >: A(k, p, j) A(j, f, 4) 
actly a=br+c itemst (0 Sc <r) A[i(r—1), t, jr-i—a] 
b 


> Cr*g™pse—™ / [(a nn nr) !]Dy-»r 
neo = (+ pftr-1)* 
(useful for small a) 





* 2; is the sum from j =((a—1)/r] +1 toj =k. 
t 2; is the sum from i =j—b to i =min (jr—a, j). 
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of items lost. In Model 4, the expected number of items lost is an in- 
creasing function of r, the loading per aircraft. This results from the 
fact that in getting in the vicinity of a clump one item is found. In all 
models the variance is an increasing function of r, the loading per air- 
craft. 

After deciding (in conference with the operations officer) which 
model is most applicable to the actual situation, the analyst can pre- 
pare numerical tables showing the probability of losing a items for 
various loadings, based on estimates of the probabilities p, ¢, and f in- 
volved; p, ¢, and f can frequently be estimated rather reliably from 
previous experience. These tables can be used in deciding what loading 
to use. 

The probability of losing a items has been found useful in making 
decisions as to the desirability of developing salvage equipment, and as 
to the best type of salvage equipment. 


6. DISCUSSION OF FORMULAS 


6.1. The Probability Generating Functions* 


The functions M,,(¢) are the expected values of ¢*, where X is the 
random variable, number of items lost: 


M(¢) = P{X = 0} +¢@P{X =1} +--- +¢"P{X = kr}. 


and hence the probability that X =a is the coefficient of ¢* in the ex- 
pansion of M(¢). 

When either k or r is small, M(¢) is easy to expand and the numerical 
evaluation after the expansion is quite easy. When both k and r are 
large, the expansion and numerical evaluation are usually tedious. 
Example 


When r=1, the coefficient of ¢* in M;(¢) is 
C.*(pft + q)**(pfs + gp)°*. 
When r=2, k=3 the expansion of M;(¢) is 
M;(¢) = A* + 3A?Bo + (3AB? + 3A°C)¢? 
+ (B? + 6ABC)¢* + (8B°C + 3AC?)¢* 
+ 3BC*p* + C*9%, 
where: A= pff@+q; B=2 pfts; C= pfs*+gp. 


* For a more complete discussion of probability generating functions see W. Feller, An Intro- 
duction to Probability Theory and Its Applications, New York, John Wiley and Sons, 1950, p. 212 ff. 
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The advantage of M,,(¢) for calculating P{X=a} depends not on 
the value of a but on the values of rand k. The other special formulas, 
discussed below, are suitable for special values of a, independent of r 
and k. 
6.2. Special Formulas for Large and Small a 


Whether the special formulas (1.5), (3.5), (3.6), and (4.5) are simpler 
than the general formulas (m.4) depends not on the value of r and k 
but on the value of a. 

The utility of these special formulas may be illustrated by exhibiting 
the formulas for P{ X=0}, P{ X=1}, and P{X=6} for (k=6, r=1) 
using the equations (3.4), (3.5), and (3.6). 

By (3.4) 

P{X = 0} = A(O, f, 0)AQ, t, 0)A6, p, 0) 
+ A(l,f, 1)A(1, t, 1)A(6, P; 1) + rT 
+ A(6, f, 6)A(6, t, 6)A(6, p, 6); 
P{X = 1} = A(6, p, 1)[A(I, f, 0)AQ, t, 0) + ACL, f, 1)ACA, ¢, 0)] 
+ A(6, p, 2)[A(2, f, 1)A(I, t, 1) + AQ, f, 2)A(2, t, 1)] 
+ oe 
+ A(6, p, 6) [A(6, f, 5)A(5, t, 5) + A(6, f, 6)A(, t, 5)]; 
P{X = 6} = A(6, p, 6)[AG, f, 0)A(O, t, 0) + A(6, f, 1)A(I, t, 0) 
By (8.6) 
P{X = 0} = (q+ pft)’; 
P{X = 1} = 6p(q + pft)*(sf + 9). 

By (8.6) 
P{X = 6} = pg + fs)’. 

These examples show the great simplification possible with the spe- 
cialized formulas. On the other hand, the use of equations (m.4) may 
be systematized so that the routine of setting up a computing table 
and computing from it may be easily learned and for the average com- 
puter this is an advantage. In Section 8 an example is worked in detail 
and in Table 2 a sample computing sheet is shown. 
7. OUTLINE OF PROOFS OF EQUATIONS 
Three random variables will occur; these will be denoted as follows; 
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X: Number of items lost 
Y: Number of aircraft ditched 
Z: Number of clumps located. 


The index m refers to the models. For example, P,{X=a} means 
the probability of losing a items under model m. 

A perfectly general equation for P,,{ X =a} is 

P,{X =a} = > > P.{X =a| Y =j;Z =1%} 
(7.1) ainsi , 
-P,{Z =i| Y =j}-PalY = 3}. 

Conceptually j runs from 0 to k and ¢ runs from 0 to j. In the equations 
given in Table 1 for P,,{X =a}, limits on j and ¢ which do not cover 
these ranges are given because for some values of ¢ and j some of the 
probabilities are identically 0 and hence these terms are omitted. For 
example, in P,,{X =kr} only j=k gives a non-zero value of P,,{ X =kr 
| Y¥=j; Z=i}. Equations (m.4) of Table 1 (m=1, 2, 3, 4) are based on 
(7.1) with the substitution for the conditional probabilities of the bi- 
nomial terms appropriate to the particular model. 


The probability generating functions (m.3) are the expected values 
of ¢*: 


(7.2) En($*) = X Pn { X = a}. 


These may be readily evaluated by using the right side of (7.1) for 
P{X=a}, first evaluating 


kr 

> oPa{X =a| ¥Y =j;Z =}, 

a=0 
then multiplying by P,.{Z=i| Y=j} and summing over i and finally 
multiplying by Pn{ Y=j} and summing over j. 

The probability generating functions may also be evaluated by 

formal manipulation of conditional expectations.‘ If X is a random 
variable with a binomial distribution: P{X=j}=A(k, p, j), then 


E(o*) = (pb + 9)". 


The conditional expectation of X given Y and Z may be evaluated 
using this fact; the conditional expectation of X given Y and finally the 
unconditional expectation of X may be evaluated by taking the expec- 





4 A description of this method is included at the suggestion of a referee. 
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tations of the resulting expressions with respect to Z and then with 
respect to Y. 

The special equations (1.5), (3.5) (3.6), and (4.5) are derived by 
using the two following facts: 


First, 


p w 
(7.3) [D, + =| A(n, p, w) = A(n, p, w + 1). 
wt+l p 
Where D, means partial differentiation with respect to g. This may 
be demonstrated by direct application of the operation indicated on 
the left. 


Second, a solution of the difference equation 


(7.4) Fw +1, p) = 2 | >. + = F(w, p) 


(where g=1—p) 


(7.5) F(w, p) = - D,*F (0, p). 


This may be proved by induction on w, (7.5) being identically true 
for w=0. 
For example, (3.5) is derived as follows. From (7.1), putting n=j—7 


(7.6) Ps{X =a} = YY ALG — nr, 8, a—nrJAG, g, n)ACk, 2, 9) 


which we write 


= >> P,(a). 


k 
(7.7) P,(nr) = 2) A(k, p, JAC, 9, n)tr 


= Ca'gp"(q + pit). 
Let w=a—nr; then P, plays the role of F of (7.5). 
Hence, 
(7.8) P,(a) = Catgrp[(s*-"")/(a — nr)! ]De-""(q + pf), 


and 
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k 
(7.9) Ps{X = a} = 2) C,tgnp[s*-™"/(a — nr) ]De—r(q + pitt) 


which is equation (3.5) of Table 1. 

The same general method is used to get (1.5), (3.6), and (4.5). No 
equation for large a, i.e., a near kr, is needed in Model 4 because when 
a is large there are few terms in the sum over ¢ of equation (4.4). 

The moments for any model may be obtained from M,,(¢) by sub- 
stituting e, for ¢, which gives the moment generating function. The 
ath moment about 0 is then the ath derivative of the moment gen- 
erating function evaluated at ¢=0. 

Alternatively the moments may be evaluated using conditional 
moments and successively removing the conditions. See, for example, 
M. H. Hansen, W. N. Hurwitz, W. G. Madow, Sample Survey Methods 
and Theory, Volume II, New York, John Wiley and Sons, 1953, p. 59 ff. 


8. EXAMPLE 


A numerical example will be worked in detail to show the computing 
schemes used and to indicate how decisions may be made based on 
these models. The values used for probabilities and costs in this exam- 
ple are not authentic because true values based on actual experience 
are classified. 

Model 3 is the model assumed for the example. According to this 
model the probability of an aircraft ditching is p; the probability of 
getting in the vicinity of the r items which were carried in the ditched 
aircraft is f; and having arrived in the vicinity of the items the prob- 
ability of finding a single item is f. 


8.1. Numerical work for the example 
For the example the following numerical values «re used: 


kr, the total number of valuable items to be carried, 6. Each of 
the four possible loadings per aircraft: 1, 2, 3, and 6 is investigated. 
f=.20 and ¢=.40 when no salvage devic> is used; 

f=.40 and t=.60 with a salvage device. ‘he salvage device might 
be a radio transmitter which sends automatically when the items 
are jettisoned. 


The expected number of items lost (i.e., ditched and not recovered) 
is, for any loading: 


without salvage device: 6(0.1) [1—(0.2) (0.4)]=.55. 
with salvage device: 6(0.1) [L—(0.4) (0.6)]=.46. 
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The salvage device reduces the expected number of items lost by 
0.09 items. If the items are worth a million dollars each, a salvage 
device which costs less than $90,000 would be worth developing. The 
cost must be net cost for salvage devices for all of the six items, less 
the value of the remaining salvage devices after the ferrying operation 
has been completed. 

The calculation of the probability of losing 0, 1, - - - , 6 items for 
each of the four possible loadings with and without a salvage device 
can be arranged so that the computing labor using equation (3.4) is not 
excessive. The computing scheme shown in Table 2 has been found 
quite satisfactory. 

This scheme is simply a method for systematizing the calculation 
of the double summation which gives the probability of losing exactly 
a items, P{a}. Usually when making decisions based on P{a}, it is 
necessary to calculate P{a} for all (k, r) sets, and when this is neces- 
sary the general formula m.4 for P{ a} is frequently more efficient than 
the specialized formulas for P{a} which are also given for some models. 

Table 2 is for k=6, r=1, no salvage device. k=6, r=1 was chosen 
for the detailed description of Table 2 because this (k, r) pair requires 
the most extensive computing sheet. 

On an actual computing sheet only the numerical values are entered 
because the literal values are not necessary and may be confusing to 
the calculator. Blanks in the body of Table 2 are to be read as zeros. 

The entries on the computing sheet which are in Roman type are 
original inputs, based on the assumed values of k, r, p, f, and t. The 
values in italics and bold face are derived values. The operation being 
performed in arriving at the italicized values is the multiplication of 
P{a\t,7} by P{|j} for a fixed j and adding over all possible ¢ values. 
In terms of the formula given in Table 1 the italicized values are: 


> A(j,f, t)A (tr, t, jr poe a), 


imj—b 


where A(0, ¢, 0) =1 by definition. For example, the italicized value of 
.7787 found in the column for 3 items lost is the sum of products 
(.512)(1.00) + (.384)(.600) + (.096)(.36) + (.008)(.216) =.7787, or in 
literal notation 
A(3, f, O)A(O, t, 0) + AGB, f, 1)A(1, t, 0) + AG, f, 2)AQ, t, 0) 

+ A(3, f, 3)A(, t, 0). 


To find P{a} the italicized values are multiplied by the P{j} on 
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the same line and these products are summed for each column. The 
results are the bold face values at the bottom of the columns, which are 
the P{a} values. From Table 2: 
P{0} = .5604 = (.5314)(1.00) + (.3543)(.08) + (.0984)(.0064) 
+ (.0146)(.0005) + (.0012)(.0004) 


P{1} = .3407 
Ps} =.0009 
P{5} = P{6} = .0000. 


Table 3 summarizes the probability values for the example. Values 
are presented for the four possible (k, r) pairs, each with and without 
a salvage device. The individual probabilities are given in the left 
hand column of each pair of columns, and cumulated values of the 
probabilities are given in the other column. The cumulated values 
show the probability of losing at least so many items. For example, in 
Table 3, the .440 in the third column means that the probability of 
losing one or more items (at least one item) is .440. 


8.2. Decision making for the exampleé 


If one loading resulted in a probability of loss which was less than 
or equal to that of any other loading for all number of items lost, that 
loading would be uniformly best. 

The necessity for executive decision is based on the fact that this is 
not so. If one wishes to minimize the probability of losing one or more 
items, the best loading is k=1, r=6. But if one wishes to minimize the 
probability of losing 3 or more items, the best loading is k=6, r=1. 
However, the loading k=3, r=2 yields almost as low a probability of 
losing 3 or more items, as the loading k=6, r=1, and the cost of run- 
ning the operation has been halved since only half as many aircraft 
are involved in the operation. It appears from Table 3 that for the 
parameter values involved in the example the loading is a more impor- 
tant variable than the use or non-use of a salvage device. 

One might wish to minimize the probability of losing 6 items if the 
crucial consideration were to deliver at least one item. This would be 
true for example if the items were special code books. One code book 
might be absolutely essential for the performance of an operation and 

5 A referee suggests that reference be made to the recent book by D. Blackwell and M. A. Girshick, 


Theory of Games and Statistical Decisions, New York, John Wiley and Sons, 1954, where a detailed dis- 
cussion of statistical decision theory may be found. 
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the delivery of all six code books rather than just one would simply 
increase the ease with which the operation could be performed. Under 
these conditions one would ordinarily choose the loading that mini- 
mized the probability of losing 6 items, even if this loading led to the 
highest probabilities for all other numbers of items lost. This is an 
example of a rather typical situation: it is impossible to put a monetary 
value on the items being carried, and hence executive judgment is 
essential. 

If the items were key executives of a corporation, it might be con- 
sidered essential to have four of them survive. In this case one would 
wish to minimize the probability of losing 2 or more items (executives). 
From Table 3 it appears that k=6, r=1 or k=1, r=6 are almost equally 
good. If in this case one had no salvage device k= 1, r=6 would be the 
choice since this costs less than k=6, r=1 and the probabilities are 
equal. If one has a salvage device, however, k=6, r=1 is a better load- 
ing than k=1, r=6, since it leads to a slightly lower probability of 
losing 2 or more items (.071 vs. .091). The decision as to which loading 
to use depends on the extra cost of making six trips instead of one com- 
pared to the extra utility resulting from a reduction of 0.02 in the 
probability of losing 2 or more executives. But here again the utility 
cannot be measured in money and the decision must be made by execu- 
tive judgment. 





THE EXPERIMENTAL APPROACH IN THE TEACHING 
OF STATISTICS* 


Epwin G. OLps 
Carnegie Institute of Technology 


1. INTRODUCTION 


t A session of the American Statistical Association, held five years ago 
A at Cleveland, I presented a paper' on the use of instructional aids 
in teaching statistical quality control. At that time it was noted [2, 
pp. 223-24] that, in the past, the average teacher of elementary prob- 
ability had made little use of experiments in his teaching and the re- 
mark was made that 

It is hard to understand why he failed to appreciate the pedagogical value 
of designing an experiment to illustrate a point of theory, predicting the 


result, running the experiment, and then taking the consequences if it 
turned out wrong. 


Clearly, the existence of some “value” in the use of experiments in 
teaching statistics was implied. 

In view of the time which has elapsed, it seems appropriate to take 
a fresh look at the matter. 


2. TYPES OF EXPERIMENTS 
In a recent dictionary [4, p. 352] an experiment is defined as: 


A trial made to confirm or disprove something doubtful; an operation under- 
taken to discover some unknown principle or effect, or to test some sug- 
gested truth, or to demonstrate some known truth;... 


The four classes of trials can be reduced to two by combining aims, 
one, three and four. Then the types of experiments described in the 
definition might be classified as those performed: 


(1) to test hypotheses, 
(2) for exploration. 


3. DEMONSTRATION LECTURES 


It would be possible to cross-classify experimentation in learning 
statistics on the basis of who performs the experiment. The so-called 
experiments used in demonstration lectures may be done by the 





* Presented under a slightly different title at the Annual Meeting of the American Statistical 
Association at Washington, D. C., December 29, 1953. 


1 This paper appeared as Part III of a joint publication [2]. Bracketed numbers refer to the list of 
references at the end of this paper. 
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teacher, by the students, or both both working together. Any individual 
student may be a participant or an observer. 

Upon reviewing my own demonstrations, it is not at all clear how 
many of them should be classified as experiments at all. For analysis, 
a lecture-plan from a manual [5], prepared for use in the intensive 
courses in statistical quality control given in various industrial centers 
during World War II, is presented below. Some description of the 
equipment used is given in [2, pp. 224-25] but it is sufficient to note 
that the bead-box of reference ordinarily contained 1152 white beads 
and 48 red beads. 


II, 4—THE CONTROL CHARTS FOR FRACTION 
DEFECTIVE AND NUMBER OF 
DEFECTIVE ITEMS 


Objective 


. To present the uses and purposes of a chart for fraction defective or 
number of defective items. 

. To give the techniques necessary for the construction and utilization of 
these charts. 

. To demonstrate the variation of fraction defective in samples drawn 
from a lot or process having a constant proportion of defective items. 

. To indicate the sensitivity of the limits in detecting a process change. 


Procedure 


1. Make introductory remarks on the uses of one of these charts, including 
the fact that the necessary data may be readily available in the form 
of day-by-day accounting records on one hundred per cent inspection. 

. Point out that the box of beads is to represent a lot of material produced 
by a machine; that, after each sample, we must imagine that the lot is 
removed and a new lot presented. This means, of course, that the lots 
are uniform and the process is controlled. The samples should verify 
this fact. 

. Take three or four samples of 50 beads in order to indicate the proce- 
dure and method of computation. 

. Draw and record twenty samples. 

. Plot values for number of defective items and for fraction defective. 

. Compute values for central line and control limits, explaining that the 
limits are 3-sigma limits. 

. Put limits on both charts, then point out that they tell the same story. 
At this point discontinue consideration of chart for number of de- 
fective items. 

. Examine chart for indication of control and state that since we have 
analyzed past data and, seem to be in control, we can use p as the stand- 
ard value, p’, and extend the control limits for use during production. 

. Have students make charts for use in controlling the process and pre- 
pare to record and plot new sampling results. 
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10. Increase the percentage of defectives in the box and draw samples until 
a point falls outside the control band. 

11. Tell the students about the process change at the beginning of the 
second set of drawings, pointing out that it did not show itself imme- 
diately. 

12. Continue sampling until there are twenty samples in the second set. 

13. Have the students compute the limits for the new set of drawings and 
place them on the chart. 

14. Give students the value of original fraction defective in the bow]; point 
out that, since the first twenty samples of 50 were in control, their total 
could be considered as a single sample of 1000. 

15. Using original fraction defective, calculate control limits for samples 
of 1000 and point out relationship of these limits to the first and second 
values of p. 

16. Return to consideration of the chart having two sets of limits and discuss 
the significance of the area common to the two bands. 


Principal points for emphasis 


1. Charts of this kind have proven to be very useful and easy to explain to 
management. 

2. Many students have found it desirable to start their use of control 
charts by analyzing data on fraction defective. 

3. Unless samples are quite large, these charts are not very sensitive to 
smal] changes. 

4. Control charts for measurements are, in general, to be preferred. They 
give much more information for process analysis. 


A critical review of the outlined procedure raises the question as to 
what the student learns from it. Unless ample time is used to give him 
careful explanation he may learn little. I am convinced that much ex- 
perimental work falls short in garnering full educational value because 
the lecturer has been niggardly in the time allotted to orientation. 

Given sufficient time and some imagination an instructor can teach 
a good proportion of an elementary statistics course from just one dem- 
onstration of this type. 

In the first place, it can be pointed out that the bead box represents 
a universe or population. The population is finite and well-defined. 
Other populations of interest to statisticians may not be so well-defined. 
They may be so large as to seem almost infinite. 

Secondly, the individual units can be readily classified into two 
groups on the basis of color. If a red bead is called a defective, there 
will be little argument as to whether or not an individual unit is defec- 
tive. 

Third, the universe is characterized by a single parameter, denoted 
by p’. If weight were the quality under consideration, the classification 
would be based on measurements and the finite universe would be 
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described by a frequency tabulation, needing at least two parameters 
to summarize it adequately. 

Fourth, if we are interested in finding out what proportion of the 
population is defective, we might count the beads. In general, this pro- 
cedure is too expensive so we must depend on the uncertain evidence 
from a sample. Obviously a single unit is not sufficient to represent 
the universe so we need to examine several units. How large a sample 
should we take? How should we take it? What statistic should we 
calculate? 

A discussion of the above questions can carry us through the ele- 
ments of probability; the binomial and hypergeometric distributions, 
together with the Poisson and normal distributions as approximations 
to them; an introduction to point and interval estimation; tests of 
hypotheses; and acceptance sampling. Furthermore, the need for 
planned experimentation becomes evident. 

It might be kept in mind that all of this can be discussed before be- 
ginning the scheduled demonstration. With the box of beads and the 
sampling paddle as actors we have given reality to the characters of 
our statistical play. Our audience is well acquainted with their human 
frailties and will not be too much surprised if they misbehave. 

It does not seem necessary to fill in the details for the entire demon- 
stration, or to call attention to the opportunities for problem and theory 
assignments based on it. It might be remarked, however, that one per- 
formance of Steps 4-7 of the indicated procedure provides a single 
sample of the operation of the control chart for fraction defective. 
Assuming that the data has been produced by a controlled process of 
unknown fraction defective, this sample of one can be used to estimate 
the probability that the recommended procedure will err in indicating 
lack of control. (Step 8 is rather interesting because it does not seem to 
recognize the possibility of such an error.) Alternatively, the sample 
of one can be used to test some hypothesis regarding the probability 
of this error of the first kind. 

The later parts of the indicated procedure provide other single 
samples of control-chart operation under various conditions. These 
samples could serve the purposes indicated above or they could be 
combined with the first sample to provide a sample of one from the 
population of all repetitions of the complete procedure as outlined. 
Parenthetically, the opportunity here for confusion by the student 
concerning sample size and the population sampled should be noted. 

Granted that, as a demonstration, the quoted procedure may have 
merit, the question remains as to whether or not it should be called 
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an experiment. The answer seems to depend on the attitude of the in- 
structor and students toward it. As indicated above, it is possible to 
plan and execute the demonstration as one or several experiments. 
Furthermore, each may be performed for exploration or to test some 
hypothesis. 


4. LABORATORY WORK 


It is my impression that much of the laboratory work in statistics 
consists of working problems involving a considerable amount of cal- 
culation. Practice in the choice and use of formulas is valuable and 
there can be no quarrel with the usefulness of learning how to operate 
calculating machines. The question arises, however, whether it would 
be possible to broaden laboratory work to include a few experiments. 

In a paper [3] presented at the Chicago meeting in December, 1952, 
Professor A. C. Rosander proposed a list of some forty-five laboratory 
experiments for probability statistics and suggested that these be or- 
ganized into a manual. The preparation of such a manual would seem 
to be a useful enterprise. It would seem difficult, however, to avoid 
loading the manual with experiments requiring an undue amount of 
mechanical repetition and uninteresting computations. 

Industrial laboratories are beginning to be interested in statistically 
planned experimentation. For this work the cooperation of statisticians 
skilled in experimental design will be sought. It would seem desirable 
to begin the training in design by work in the statistical laboratory. 
Therefore, a manual such as Rosander proposes ought to include exer- 
cises on designing experiments as well as on executing them. 


5. THE EXPERIMENTA*, APPROACH IN THE SEARCH FOR TRUTH 


Up to this point I have considered the use of the experiment as an 
aid in digesting the existing body of statistical principles and methods. 
There is quite another aspect of the experimental approach which now 
deserves some attention. I refer to the use of experimentation in push- 
ing back statistical frontiers. 

The role of induction and generalization in the field of statistics is 
well known, with the contribution of “Student” as a shining example. 
It seems reasonable to expect that the study of samples will continue 
to feed the intuition needed to bridge the gap between the experimenter 
and his goal. 

With the greater availability of high-speed computers experimental 
sampling can be done on a scale undreamed of fifty years ago. A 
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Weldon? who rolls a set of 12 dice a total of 26,306 times yields his 
place in print to the machine which turns out a few million pseudo- 
random numbers while warming up for a really big job. The inductive 
reasoner no longer need restrict his operation to a few small samples. 

As a footnote to this discussion of using the experimental approach 
in the search for truth, an additional remark might be added. Belief 
is an individual matter. Some students require one type of proof, some 
another. One of my beginners could not accept the mathematical der- 
ivation of the probability density function for the sum of a sample of 
two from a rectangular universe. After he and his wife spent several 
hours drawing samples from a double deck of cards he was entirely 
satisfied with the truth of the theorem. 

This is, by no means, an isolated example. It is my belief that a con- 
siderable proportion of our elementary students have little faith (or 
interest) in mathematically established truths until they have seen 
experimental verification. One definition of “experiment” quoted above 


was 
“A trial made. . . to demonstrate some known truth” 


For the thousands of men taught statistical quality control in the past 
dozen years the trials made to demonstrate the instructor-known 
truths often have seemed to be the only means of getting the message 
through to its destination. 


6. EVALUATION OF RESULTS 


My previous remarks regarding the experimental approach imply 
that it has some positive value. This opinion might well be tested by 
means of a designed experiment. Presumably any scientifically-minded 
educator who advocates a particular plan of teaching is willing to give 
his pet theory an objective test. Statisticians, in particular, claim that 
they prefer to base conclusions on facts rather than opinions. 

For our statistician-seedlings we might have two sets of treatments. 
The first set, as discussed in Section 2 above, might consist of the ex- 
ploratory experiment and the experiment to test hypotheses. The sec- 
ond set of treatments might include the experiment by the instructor 
in lecture (with or without help) and the experiment by the student 
in the laboratory. 

The treatments in the second set could be applied in various 
strengths. Either we could rush through the experiment, or we could 
provide various amounts of orientation before, during and after per- 





* Weldon’s dice data are given in [1, p. 278]. 
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formance. I have averred that the student learns little unless the 
ground-work is carefully laid; but this is an opinion needing proof. 

Measurements of two kinds of results should be made on our sub- 
jects. In the first place, increase in knowledge of existing principles 
and methods should be gauged. Second, there should be a measurement 
of discovery-value. 

Finally, the planned procedure ought to provide for a control group 
which receives no teaching by the experimental approach. 

It is not my intention to extend the discussion of this test any fur- 
ther beyond remarking that in order to plan and conduct it properly 
the cooperation of a professional in the field of educational tests and 
measurements probably would be necessary. 


7. CONCLUSION 


In the early part of this paper the opinion has been expressed that 
the experimental approach has value and a few suggestions have been 
made regarding combinations of treatments for best results. In the 
last section it has been suggested that these hypotheses should be 
tested by a planned experiment. Whether or not there is agreement with 
the earlier statements in this paper, I am confident that the recom- 
mendation for the collection of some unbiased evidence will receive 
support. 
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USE OF EXPERIMENTS IN TEACHING 
ENGINEERING STATISTICS* 


Irvine W. Burr 
Purdue University 


1. INTRODUCTION 


XTENSIVE use of sampling experiments was one of the character- 

istic features of the war-time intensive courses in statistical quality 
control [5], and is still an integral part of the present short courses in 
the subject. For those with as little mathematical skill and under- 
standing as many industrial men have, derivations are out of the ques- 
tion, and the only recourse is to demonstrate the theory by sampling 
experiments. Industrial men are readily convinced by such experi- 
ments. especially when they themselves do the sampling. 

The situation would seem to be quite different when one is teaching 
engineering students who have had calculus. It is not as different, 
however, as it seems. Engineering students are intensely practical and 
few of them are fond of mathematical theory. Hence weil chosen and . 
carefully explained experiments are a valuable supplement to mathe- 
matical derivation. Even for a student majoring in mathematics, ex- 
periments can serve to illuminate the theory, and they certainly do 
give the average student a clearer idea of “statistical thinking.” 

The instructor should make clear (a) the principles involved, (b) the 
comparison of the observed results with theory, (c) the necessity for 
careful arrangement for randomness, and (d) the fallibility of the ex- 
periment, explaining any “misbehaving” of the experiment. 


2. SAMPLING POPULATIONS 


For fraction defective experiments a convenient way to simulate a 
binomial distribution is to use a box of beads.' One color, say white, 
can be called good, and another, say red, can be called defective. Then 
if we use a great many more beads in the box than in a sample, the 
actual hypergeometric distribution will give a good approximation to 
the desired binomial. It is convenient to maintain 1000 white beads 
and then vary the number of red ones for various fractions—42 for 4 
per cent (42/1042), 87 for 8 per cent, etc. Samples of 50 seem to work 
well. A description of paddles for such drawing is given in part III of 


* Presented at the Annual Meeting of the American Statistical Association at Washington, D. C., 
December 29, 1953. 

1 Suitable beads one centimeter in diameter of various colors may be obtained from Walco Bead 
Co., 37 West 37th St., New York City. 
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an article by Olds and Knowler [4]. The relative error in approximating 
a binomial probability by a hypergeometric probability is approxi- 
mately [2] 


~ —_ [a - @ - np] 
2Np'q’ ’ 


where N and 7 are the lot and sample sizes respectively, d is the num- 


TABLE 1 
POISSON POPULATIONS FOR SAMPLING EXPERIMENTS 








Number of Chips for Given Parameter 
No. of 


Defects ec =] c’ =2 








184 68 
135 
92, 
31 90 
8 


0 
1 
2 
3 
4 
5 
6 
7 
8 
9 





501 500 501 
Actual Average 1.004 2.006 4.002 6.020 
Actual Variance 1.006 2.042 3.982 6.036 








ber of defectives in the sample, p’ is the fraction defective in the lot, 
and q’ is 1—p’. 

For the Poisson distribution, no such convenient method is available, 
other than that of making the double approximation of the hyper- 
geometric for the Poisson. The best possibility would seem to be to 
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use beads or chips, each numbered with a number of defects.? Then 
drawing one chip or bead gives a value of c, the number of defects. For 
such a Poisson population it is desirable to have at least 500 pieces 
so that there will be a few rare values of c available in the population, 
in order that a c chart can go out of control. If too few chips, say 100, 
are in the population there probably will be no rare values of c—none 
can be rarer than 1 in 100. The populations shown in Table 1 are quite 
serviceable. It should be noted that if a sample from a population with 
a larger parameter, c’, is desired, this is readily available because of 
the additive property of the Poisson distribution. Thus, if we want 
c’=14, we can draw one chip from c’ =8 and one from c’=6, and the 
total of the two c values will have the desired distribution. 

For normally distributed populations Table 2 gives a flexible set of 
distributions. These were used in the war-time courses and are still in 
wide use. Another convenient way to generate approximately normal 
populations is through using various numbers of dice. Although the 
distribution of the number of points on a single die is rectangular, the 
total number of points on 3 dice is fairly normal, as the following 
theoretical distribution shows: 


Total 3 dice 3456 7 8 9 10111213 1415 16 17 18 Total 
216 X Probability 136 10 15 21 25 27 27 25 211510 6 3 1 216 


For n dice at a throw we have the following theoretical results for the 
total points: 


al 35n 1.27 
X’ = 3.5n, ——  /=3 -— - 
12 n 


3. SAMPLING EXPERIMENTS 


In any one course one would never use all of the experiments here 
suggested. Nevertheless it seems desirable to list them all. 


3.1. Sample Frequency Distribution 


A simple and effective demonstration of the binomial may be had by 
casting 6 dice and counting the number of aces appearing on each such 
cast. One hundred casts will give an interesting comparison between 
the observed frequencies of numbers of aces and the theoretical, from 
100 (+-$)*. The chi-square test of goodness of fit may be used, if de- 
sired. 





2 White fiber chips may be bought from Lamb Seal and Stencil Co., 824 13th St., N.W., Washington, 
D. C. However, if one can find the right industrial company he can obtain thousands of fiber discs free, 
since they are scrap in many processes. 
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Binomial samples from beads may be taken with much smaller frac- 
tion defective, thus illustrating better the typical situation in industrial 
lots. Samples from any of the populations in Tables 1 and 2, or dice 
totals make good examples to compare theory and sample. 


TABLE 2 
APPROXIMATELY NORMAL POPULATIONS 








Frequencies for Population 





B Cc D 





| 
— 
oo 


pet tes bet OOD BD Se OD ES Oo Oe ND OO OTS et 


. 
7 
6 
5 
4 
3 
2 
1 
0 
1 
2 
3 
4 
5 
6 
7 
s 
9 


te 
— 
- © 





200 
0 
1.71 
3.02 





6 
sao8 





8.2. Control Charts 


Effective use of the foregoing populations may be made to illustrate 
X, R, o, p, np, and ¢ charts. One may take samples to compare against 
center line and limits calculated from the known population charac- 
teristics. But perhaps more interesting is the case of “no standard 
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given.” A series of preliminary samples are drawn and control lines 
figured. Control is checked, the lines continued, and new “production” 
analyzed. The population can then be changed by increasing p’ for 
fraction defective, or by using a different measurement or Poisson 
population. Thus one might use Population A of Table 2, then shift to 
B or D, or use c’=4 in Table 1, then shift to c’=8 or 1, and observe 
the effect on the chart. 

Other interesting experiments have to do with ways in which samples 
can be taken [5], [2], [3]. For example, a stratified sample may be taken 
by letting X, be the total points for 2 dice, X_ that for 3 dice, X; for 4 
dice, X, for 5 dice. Then the first X,, X2, X3, and X, together comprise 
the first sample, etc. Such a sample is stratified because it contains 
one value from each of 4 populations. The range for such samples will 
be so inflated by the difference between means, especially X, and X,, 
that both charts for X and R will show “too good” control. On the 
other hand if the first four X,’s from the same data are taken in one 
sample, then four X,’s, etc., and all the data put on one X and R chart 
the X,’s and X,’s will stand out beyond the control limits like “sore 
thumbs,” because they are out of control with respect to the rational 
sample variability. The same experiment may be readily done with 3 
distributions A, one of B and one of C. Such populations can then be 
mixed, and samples drawn, illustrating how random samples from 


mixed product show good control despite the presence of assignable 
causes. 


8.3. Significance Tests 


Any of the standard significance tests may be made by using asam- 
ple from one of the populations. Thus one can test the hypothesis 
that X’=0, using either population A or D, and assuming either that 
a’ is known or unknown. Or one can test the same hypothesis against 
populations B, C, or EZ. An interesting variation is to let each class mem- 
ber make the test by drawing his own sample. In this way one would 
expect about one class member in 20 to refute a true hypothesis, at the 
5 per cent level. Also the class can thus build up a ?¢, or a non-central ¢ 
distribution. 


3.4. Estimation 


The subject of biased and unbiased estimates of population parame- 
ters may be illustrated by drawing a series of small samples and tabu- 
lating the various kinds of estimates, say, of both population standard 
deviation and variance. 
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‘Confidence intervals may be set from each of a series of samples, 
There is a beautiful illustration of this kind of experiment given by 
Shewhart [1]. It is advisable to use 90 per cent confidence intervals so 
that there is a reasonable chance for an interval to fail to contain the 
paramete: . 


8.5. Analysis of Variance 


Obviously one can run an analysis of variance with all cell samples 
from distribution A, to illustrate the null hypothesis. Data for simple 
designs can be repeatedly drawn and calculations made to yield a dis- 
tribution of F values. Then one or two cell samples can be drawn from 
B or C and thus the null hypothesis stands to be rejected some of the 
time at least. (Of course one can more easily illustrate the formation 
of an F distribution with pairs of samples from A, or say from 4 dice, 
and the non-central F by a sample from A and one from D, or one 
with 4 dice then one with, say, 10 dice.) 

Less obvious but equally valuable for illustration are experiments 
where the true cell mean is determined from any desired linear hypothe- 
sis, and then a random error component is added, such as a drawing 
from distribution A, or a toss of 3 dice. A linear or quadratic trend 
can readily be simulated. Such trends must be strong relative to the 
random error, unless large samples are to be drawn at each point. 

Tests for homogeneity of variance, such as Bartlett’s, and homo- 
geneity of proportions, such as chi-square, also can be readily illus- 
trated. 


3.6. Statistics of Combinations 


One good experiment to illustrate the tolerances for mating parts is 
often worth any amount of equations for the practical man. A success- 
ful one is the following: 


Outside diameter of shaft 
X’ = 3.00105”, o’ = .000296”. 


Throw 3 dice, and regard the number of points as the number of 
ten-thousandths of an inch in excess of three inches. 
Inside diameter of bearing 


X’ = 3.0021’, o’ = .000418”’. 


Throw 6 dice, and regard the number of points as the number of 
ten-thousandths of an inch in excess of three inches. 
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Draw the two frequency distributions, the 3c limits for each and the 
true extreme possible values for each. The percentage of interference is 
only about 2.6 per cent as may be illustrated by experiment, but guesses 
prior to experiment, based on the two frequency distributions shown, 
will usually run much higher. Those who want no overlapping in even 
the extreme ranges, would never sanction such distributions, even 
though they would probably be satisfactory in practice. Hence such 
persons would specify a considerably greater mean difference, thus 
giving too many loose fits. 


3.7. Acceptance Sampling 


A wide variety of sampling experiments are regularly used to show 
the way various sampling inspection plans operate on “acceptable,” 
“marginal,” and “rejectable” material, or on mixed product. It can be 
shown how the consumer is protected against bad quality, on a lot-by- 
lot basis and on an average-quality basis. An empirical operating char- 
acteristic curve, showing the observed probability of acceptance for 
given quality may be built up. Not only attribute plans, but also vari- 
ables plans may be illustrated in operations. For other possibilities,see 
the original manual [5] and current manuals for short courses in quality 
control, such as those at the Universities of Iowa, Illinois, and Michi- 
gan. 


3.8. Sequential Analysis 


Experiments are especially effective in showing the way in which 
sequential analysis reaches its decision. The same plan can be tried on 
different populations, some “good” and others “bad.” Any of those 
listed in Section 2 can be so used. 


8.9. Linear Correlation 


One can easily illustrate sampling from an uncorrelated population 
by drawing independent random samples from the pair of populations. 
For example, let Y be the total points for a throw of 3 dice, while X is 
a drawing from population A. To illustrate a correlated population, one 
can draw a chip from A, and then use this as a correction to whatever 
one gets in a toss of 3 dice. Thus if a “minus 2” is drawn for X, and the 
throw yields 13, we record 11 for Y, etc. For this case, we have r=.5013, 
and a good approximation to a normal bi-variate population. It can 
in fact readily be shown that 





r= Vox’/(ox* + oz?) 
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where ox’ is the variance of the X values, and oz* the variance of the Y 
values before given the adjustment from X. 

If weaker population correlations are desired, we may throw more 
dice for the original Y value thus increasing oz’. Stronger correlations 
than .5 are obtainable through using distribution D for X and, say, 3 
dice for the original Y, or else letting X be a drawing from distribution 
D, and Y this drawing plus a drawing from A. The respective values 
of r are .7608 and .8965. 


3.10. Curve-fitting 


In curve-fitting one can take data from an exact mathematical curve 
at equally spaced X’s, and find one Y value for each such X, by taking 
the mathematical value and adding a random error. The latter should 
be small relative to the variation in the mathematical curve if a close 
relationship is desired. The formula in Section 3.9 can be used to give 
the true correlation ratio for the non-linear population. Thus we have 


ee 

= Beer arg aE 

ow? + oz? 

where oy’ is the variance of the true curve ordinates and gz? is the 


variance of the error component added to W. 


3.11. Research Problems 


Frequently a research worker is unable to find the distribution of 
some statistic. Recourse may then be had to samples from a suitable 
population. For example, an engineer in design wanted the frequency 
distribution of a compounding of eccentricities at random angle. Thus 
he wanted the distribution of Z in 


Z? = X?+ Y? + 2XY cos 0 


where the distributions of X and Y were known and @ was rectangularly 
distributed 0° to 180°. The approximate mean and variance were found, 
but a sampling experiment was needed to determine the approximate 
shape of the distribution. 

A typical class experiment would be to find empirically the sampling 
distribution of the standard deviation for a skewed or a J-shaped 
population. 


4. CONCLUSION 


The foregoing experiments could form the bulk of the theoretical 
discussion of a course or could be used only as an occasional supple- 
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ment to derivations. If carefully explained and interpreted relative to 
some practical situation, such experiments can be most helpful. 
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Readers and authors are invited to submit corrections to papers 
published in any previous issue. These will be published each year, in 
the December issue. 


Grab, Edwin L., and Savage, I. Richard, TaBLes or THE EXPEcTED 
VALUE oF 1/X For PosiTivE BERNOULLI AND Poisson VARIABLEs, 
Vol. 49, No. 265 (March 1954), 169-77. 

The following paper has recently been drawn to our attention: J. Ti- 
ago de Oliveira, “Sur le calcul des moments de la réciproque d’une 
variable aléatoire positive de Bernoulli et Poisson,” Anais da Faculdade 
de Ciéncias do Porto, 36 (1952), 5-8. 

The material in de Oliveira’s paper suggests alternative methods for 
computing the tables of our paper. 


Laderman, J., Littauer, S. B., and Tukey, John W., Tue INvEn- 
TORY ProsueM, Vol. 48, No. 261 (December 1958), 717-732. 
The following corrections should be made: 


Page Line Reads Should Read 
1723 12 (2nd display line) for dsSy. for d2y. 

726 16 henec hence 

729 17 225 225 


Roshwalb, Irving, Errect or WEIGHTING BY CARD-DUPLICATION ON 
EFFICIENCY OF SurRVEY ReEsvutts, Vol. 48, No. 261 (December 1953), 
773-777. 

The first term within the brackets of expressions (3) and (6) should 
read 4n(P —N)/(P—1) instead of 4n(P—n)/(P—1). As a consequence, 
the coefficient of r in the denominator of (4) should be (1+)? instead 
of (5d?—2d+1). These changes do not affect the tabulated relative in- 
formation in the case of sampling with replacement and have no sig- 
nificant effect except in the case of high sampling rates. 


Savage, L. J., THe Turory or Statistica Decision, Vol. 46, No. 253 
(March 1951), 55-67. 

On page 60, in lines 19, 32, and 36 change $1 to $1/2; and in line 36 
change $2 to $4. 
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All communications concerning this section should be addressed to 
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the Department of Statistics, University of North Carolina, Chapel 


Hill, North Carolina. 


Aroian, Leo A., “What Makes a Quality 
Control Chart Tick,” Industrial Quality 
Control, 10 (1954), 38-43. 


The concept of errors of type I and type 
II are presented for control charts based on 
fraction defective and number of defectives. 
Tables and charts are included showing the 
relation between the two types of errors as 
functions of selected sample sizes and qual- 
ity levels. Two probability models are con- 
sidered when the process is out of control: 
(1) it is operating at a new constant level, 
(2) it is operating at a different level at each 
decision point. Greratp J. LigzBERMAN, 
Stanford University. 


Borch, Karl, “Effects on Demand of 
Changes in the Distribution of Income,” 
Econometrica, 21 (1953), 325-31. 


This paper reports the results of a par- 
ticular attempt to lend a more explicit 
economic interpretation to effects fre- 
quently attributed to “time” variables in 
econometric analyses. The particular em- 
pirical results on which this further analysis 
is based are those of Prest (The Review of 
Economics and Statistics, 31: 1, 1949), as 
revised by Farrell (Econometrica, 20: 2, 
1952), for beer, spirits, and tobacco. Prest 
analyzed demand for these three goods in 
the United Kingdom, using time series data 
for the period 1870-1938, omitting 1915- 
1919. The basic form of the demand relation 
used by Prest was, C;=kYePfett@:, 
where C and FY are, respectively, consump- 
tion and income per capita, P is price de- 
flated by a cost-of-living index, ¢ is time in 
years, and z is a discontinuity variate tak- 
ing value 0 for 1870-1914 and 1 for 1920— 
1938. In Prest’s results the time variables 
in the exponent were dominant variables in 
“explaining” the variation in consumption 
over the period. The present author’s hy- 
pothesis is that changing income distribu- 
tion over time might be an important eco- 
nomic factor underlying the highly signifi- 
cant coefficients of the time variables in 
Prest’s results. 

The author presents results of a particu- 
lar formulation consistent with his hy- 


pothesis. The form of the income distribu- 
tion function is assumed to be the logarith- 
mic-normal and income elasticity of con- 
sumption is assumed to be represented by 
E(y)=p+q/log y. In this framework the 
pattern of change in income distribution 
over time which would approximately ac- 
count for the effect of the trend found by 
Prest is determined. Taking Farrell’s re- 
vised estimates of the coefficients a, c, d, 
and f as given, the coefficients p and g in 
the income elasticity expression are ap- 
proximated. Having values for p and q, the 
coefficient of variation of the income distri- 
butions at selected points in time are cal- 
culated. The results, apart from extreme 
values for spirits in two years, reflect a 
marked trend toward equality in the distri- 
bution of income. On a priori grounds this is 
considered an acceptable development of 
the income distribution. The p and g ob- 
tained are also used to calculate income elas- 
ticities corresponding to different levels of 
income. Itis concluded that the resulting elas- 
ticities do not generally conform to what 
would be expected intuitively. A main con- 
clusion of the author is the following: “One 
should not draw overly general conclusions 
from the rough calculations in this paper. 
The results seem, however, to indicate that 
changes in the distribution of income can 
play an important part in explaining time 
trends in demand functions.” Ivan M. Lz, 
University of California. 


Brown, T. M., “Standard Error of Forecast 
of a Complete Econometric Model,” 
Econometrica, 22 (1954), 178-92. 

The author develops and presents in 
matrix form approximate formulas for the 
estimation of the elements of the vector of 
the standard error of forecasts of the en- 
dogenous variables in a multiequation 
econometric model. The framework is de- 
veloped first with respect to a single equa- 
tion in which it is assumed that the condi- 
tions of the Markoff Theorem are met. For 
the single equation case, a general expres- 
sion for the forecast variance is developed as 
a sum of two components; vis., the vari- 
ance of the estimated mean of Y for given Z 
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and the disturbance variance. This may be 
expressed in matrix form as, o%(Y,r) 
= Zp [o(a) |Zr’ +0, where Zp is the vector of 
predetermined variables assumed known) in 
the forecast period, o(a) is the covariance 
matrix of estimated coefficients, and o* is 
the disturbance variance. In the single 
equation case, o(a) and o* are estimated by 
well-known methods. 

In the multiequation complete econo- 
metric model, the above forecast variance 
becomes a vector containing as many ele- 
ments as there are endogenous variables in 
the system. Each element of this vector 
may be viewed as the sum of two com- 
ponents analogous to those specified for the 
single equation case. The multiequation 
complete econometric model may be ex- 
pressed, BY’+I'Z’=AX’=yp,’ where Y, Z 
and pp are, respectively, vectors of en- 
dogenous variables, predetermined vari- 
ables, and unobservable disturbance terms; 
8 and I are population coefficient matrices; 
X= [YZ]; and A= [ST]. Let the estimated 
model be represented by BY’+(CZ’= AX’ 
=ty’, where B, C, and ty are estimates of 
8B, T, and yy, respectively. Written in a 
form most suitable for forecasting Y for 
given Z, the estimated model becomes, Y' 
= B1CZ’+ By! = FZ’ +14: up have been 
designated “partial residuals” and Us “total 
residuals.” The system expressed in the 
above form is referred to as the “forecast 
reduced form.” Let a* be a vector contain- 
ing all nonzero, nonunit elements of es- 
timated matrix A. Then, since F= —B-1C, 
each element fi; of F is a function of the 
elements of a*. A forecast of the ith en- 
dogenous variable (Yir) can be calculated 
from the ith equation of the above “fore- 
cast reduced form” system. By analogy 
with the single equation development, the 
estimated forecast variance is the sum of 
two components and may be written, 
S*(Yir) = Zr [S(/,) |Zr’ +S.s?, where Zr is 
the vector of “known” values of the prede- 
termined variables in the forecast period, 
S(fi) is the estimated covariance matrix of 
the estimated coefficients f;, and S,;* is the 
estimated variance of “total residuals” y,; 
in the ith equation. In the procedure out- 
lined, S,;? is obtained directly from the ex- 
pression, S,*=(1, —fi)[Mx;x,]0, —fi)’, 
where (1, —f;) is the coefficient vector and 
Mx;x; is the moment matrix of variables 
in equation 7. The elements of S(f;) are ob- 
tained from, S(fi)= [df;/da*|S(a*) [af; 
/da* |’, where Of;/da* is the Jacobian of the 
coefficients f; with respect to a*, and S(a*) 
is the covariance matrix of the estimated 
structural coefficients a*. The elements of 
S(a*) are estimated from the negative in- 


verse of the matrix of second order partial 
derivatives of the logarithmic likelihood func- 
tion. The elements of both [0/;/da*] and 
S(a*) are evaluated at the point of sample 
estimates a*. 

In conclusion, a few remarks are offered 
with respect to degrees of freedom in small 
samples, confidence versus tolerance inter- 
vals, and reduced form equations (coef- 
ficients of the reduced form estimated di- 
rectly by least squares) versus the “forecast 
reduced form” as a basis for forecasting. 
The suggested rules in the case of small sam- 
ples are taken directly by analogy from the 
single equation case in which Markoff con- 
ditions are met and are recognized as pos- 
sibly quite inappropriate to the multi- 
equation model. A minor error is noted in 
the correction factor in equation 31, which, 
by analogy with least squares, should read 
(T/T—m)"2, Ivan M. Len, University of 
California. 


Downton, F., “Least-squares Estimates 
Using Ordered Observations,” Annals of 
Mathematical Statistics, 25 (1954), 303-16. 

Ordered least-squares estimates are ob- 
tained for a class of 2-parameters distribu- 
tions of the form f{(z—)/o}/o. The gen- 
eral expressions for the quantities to evalu- 
ate the estimates of u and ¢ are given for this 
case. In any special case, substitutions can 
be made in the general formulas to give the 
particular estimates. The ordered least- 
squares estimates of 2 distributions, viz. the 
rectangular and the right triangular, are 
given as special cases. The expected values, 
the variance matrix, the estimates and 
their variances are calculated for the latter 
distribution for samples up to size 10. Fur- 
thermore, the single parameter system of 
the form f(z/d)/d is considered and from 
this is dezived the ordered least squares 
estimate of A, and the expressions for the 
quantities needed to calculate the estimate. 
In the case of the exponential distribution 
f(z) =e7*/*/d, the estimate of d is the sample 
mean. A Pearson Type III distribution, de- 
pending upon a single dispersion parameter, 
is discussed and the ordered least-squares 
estimate turns out to be identical with the 
maximum likelihood estimate. A. E. Sar- 
HAN, University of North Carolina. 


Durbin, J., “Some Results in Sampling 
Theory when the Units Are Selected with 
Unequal Probabilities,” Journal of the 
Royal Statistical Society, Series B, 15 
(1953), 262-69. 

A rule is given for calculating estimates 
of sampling error that can be applied to mul- 
tistage designs of any degree of complexity 
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and which is an extension of the rule given 
by Yates for the case of equal probabilities 
of selection. The relation between the theo- 
ries of sampling with and without replace- 
ment is discussed and two approximate 
procedures are described which are easy to 
apply but lead to slight overestimates of the 
sampling error. T. 8S. Russexi., Virginia 
Polytechnic Institute. 


Epstein, B., and Sobel, M. “Some Methods 
Relevant to Life Testing from an Exponen- 
tial Distributiou,” Annals of Mathematical 
Statistics, 25 (1954), 373-81. 


The authors considered that they have 
N items for life testing divided into k 
set S;, each containing n;, based on the 
two parameter exponential distribution 
(1/@)e-@-409, AS2zS. Each set is ob- 
served until the first r; failures occur. Three 
different cases are considered according as 
the nj items have a common known or un- 
known Aj, or the N items have a common 
unknown A. Some preliminary lemmas and 
corollaries are given concerning the r-or- 
dered observations out of n based on the 
given exponential distribution. The maxi- 
mum likelihood estimate of @ in the three 
cases is obtained. The article shows that 
the random variable 2R 6/0 (R observed 
z's) is distributed as x? (2R), x?(2R—2k) 
(k sets), and x?(2R—2) in the three cases 
respectively. For the case in which the N 
items have a common unknown A, the 
confidence limits for @ not involving A, and 
for A not involving 6, have been worked 
out. They show that the three cases are 
equivalent to assuming that the life dis- 
tribution in the various sets will plot in the 
form of straight line (s) whose slope (s) can 
be estimated by their results. A. E. SaRHAN, 
University of North Carolina. 


Gartaganis, Arthur J., “Autoregression in 
the United States Economy, 1870-1929,” 
Econometrica, 22 (1954), 228-43. 
Correlograms for 1 to 6 lags were ob- 
tained for each of 83 economic series taken 
from Burns (Production Trends in the 
United States Since 1870). The series were 
classified in agriculture, mining, and manu- 
facturing sectors, also similar to those de- 
fined by Burns. The time series covering the 
period 1870-1929 were broken into two 
periods (1870-1913 and 1914-1929) for 
analysis and comparison of autoregressive 
structure. Mean correlograms were con- 
structed for each sector and for each period. 
Results of approximate tests, based on 
“Student’s” t, for differences in mean auto- 
correlations between time periods within 
sectors and between sectors within time 
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periods for each lag are reported. Significant 
differences, “with minor exceptions,” are 
reported between time periods within seg- 
ments. A few significant differences appear 
between segments within time periods at 
several lags, but here the differences are 
not so marked. Autocorrelation coefficients 
for a few series with and without trend ad- 
justment (deflation by population series) 
are tabled for comparison. 

Although autoregressive structures are 
not calculated, it is asserted that graphic 
appraisal of the series during the 1870-1913 
period indicates that they are evolutive. 
For the 1914-1929 period, autoregressive 
structures are determined for the agriculture 
and mining sectors from the mean auto- 
correlations calculated. The roots of the 
characteristic equations of these structures 
suggest that the autoregressive systems 
are stationary. Because of “heterogeneity” 
the manufacturing sector is disregarded in 
this latter analysis. From the mean auto- 
correlations and the autoregressive co- 
efficients, approximations to mean autocor- 
relations for additional lags are obtained for 
agriculture and mining. The correlograms 
dampen and finally vanish. This suggests 
either a moving average or autoregressive 
type of structure. The author ventures the 
opinion that the structures are autoregres- 
sive. The author’s main conclusions are 
summarized as follows: “(1) The auto- 
regressive structure of the economy for the 
period 1870-1913 differs from that of the 
period 1914-1929 period. (2) Orcutt’s hy- 
pothesis (Journal of the Royal Statistical 
Society, Series B, 10: 1, 1948) that Tin- 
bergen’s series of the 1919-1931 period c1n 
be considered as a sample drawn from a 
population having the autoregressive struc- 
ture, y= 1.3y1—3y-ete, is not em- 
pirically substantiated by our sample.” The 
alternative hypothesis is stated that for the 
similar period, 1914-1929, the American 
economy can be considered as a population 
having different underlying autoregressive 
structures. Two of these have been esti- 
mated in this paper. They are for agricul- 
ture and mining, respectively: x;= .29742:_1 
— .024024_2— .05612,_3—.01922;_4+.046324_5 
= -09362+6+ et, and ad .365424_1 
+.258227_2+.13842;_3—.25532,4— .079824_5 
+.18312z,6+e. Ivan M. Lee, University 
of California. 


Gulliksen, H., “A least aoeers solution for 
successive intervals unequal 
standard deviations,” gr 7 gg 19 
(1954), 117-39. 

A least squares solution for the scale 
values obtained by using the method of 
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successive inicrvals for the basic observa- 
tional data is derived. The theoretical solu- 
tion depends upon solving simultaneously 
for the scale values (m;), the discriminal dis- 
persions (s;), and the category boundaries 
(t,) which will minimize the quantity 


(1/ BZ iLo(sszig- +m: —t,), 


where 2i, is a normal deviate correspond- 
ing to an observed proportion and b is an 
arbitrarily assigned standard deviation for 
‘. 

Numerically the direct least squares 
solution is laborious; methods for simplify- 
ing the computations are presented. A 
series of numerical examples compare the 
relative accuracy of scales obtained ‘from 
various computational procedures. B. J. 
Winer, University of North Carolina, 


Gurland, John, “An Example of Auto- 
correlated Disturbances in Linear Regres- 
sion,” Econometrica, 22 (1954), 218-27. 

The author investigates the loss of effi- 
ciency of estimators of the regression param- 
eters when there are certain types of 
specification bias concerning the disturb- 
ances. Let ys=iit+us (i= 1, 2, -++, n), 
where £;, the expected value of y;, is a 
linear combination, §;=6)21;+Oate;+ + «° 
+O¢_17%-1,6 +0, of the unknown param- 
eters 0, and KEu;=0. The covariance 
matrix of disturbances u; (Q) consists of 
elements Zujuj=o%w;;. It is assumed that 
the z’s are “fixed” variates and that the 
elements w;; are known. The disturbances 
are assumed to be generated by a first-order 
Markoff process, u:;—pu;1=2%;, where 
Evgut1=0, (¢=-N-+1, wt Se 1, 0, 1, ia 
n), Eu,=0, Evoy=0 (t#?t’), Eve=o?, 
(t=—-N, —N+1,---, —1, 0,1, 2,-->+, 
n). The “initial value” u_y is defined by 
u_n=6v_y, where the value of 4 is selected 
arbitrarily. Define gy*=1+p*+p'+ +++ + 
p*+292, If the true values of p and 8 are 
known, the best linear unbiased estimates 
of the parameters @ may be obtained by 
means of the transformation #,=y; 
— pyt-1, nt = Fe— pEt_1, Zhe = Tae— pTr,t-1 (t= 2, 
3,°°°, nm; h=l, 2,-+-+, k), with a 
= yi/on, m= £:/gn, and 2m = 2m1/gn and solv- 
ing the k linear equations 3/00,[1/gn? (y: 
—£:)?+2 5" (#:—n;)?]=0. If incorrect values 
are used for p or 6, or both, the estimates, 
although unbiased, will no longer be “best” 
in general. 

Cochrane and Orcutt (Journal of the 
American Statistical Association, 44: 245, 
1949) recommend neglecting the first term 
in the above set of k linear equations, claim- 
ing this is justified if the true value of p is 


close to 1. Assuming that the true value of 
p is known, the author investigates the loss 
of efficiency from unjustifiably omitting 
the term 1/gy* (yi—£:)?, that is, assuming 
gn extremely large when in fact it is not. 
For the case k=2, the author derives the 
limiting expression for the joint efficiency 
of the estimated regression parameters 
under the incorrectly specified gy (denoted 
by g*). It is then shown that there exist 
values of 2; and g* in one case and values of 
p and g* in another for which the efficiency 
is arbitrarily close to zero. Limiting ex- 
pressions derived hold also in the case of 
evolutionary series for u;. Three interpre- 
tations of the assumption that g* is very 
large are given. 

Also investigated is the possible loss of 
efficiency in assuming u; to be a stationary 
process when, in fact, it has an initial fixed 
value u_y=0. From the expression de- 
rived for joint efficiency, it is concluded 
that this incorrect specification could be a 
source of the considerable loss of efficiency 
of the estimates obtained by Cochrane and 
Orcutt from their series designated by (B). 

In an appendix, the joint efficiency of 
estimated regression parameters is derived 
for the general case of incorrectly specified 
disturbance covariance matrix. A minor 
notational omission appears on page 226 
where, in the covariance and joint effi- 
ciency expressions, 2 should read 2°. Ivan 
M. Les, University of California. 


Guttman, L., “Some Necessary Conditions 
for Common-Factor Analysis,” Psycho- 
metrika, 19 (1954), 149-61. 

One of the fundamental problems in 
common-factor analysis is: given a matrix 
of sample intercorrelations R having units 
in the diagonals, to find a diagonal matrix 
U2 such that G;= R—U? is a Grammian 
matrix of minimum rank. Three theorems 
which give lower bounds on the rank of G 
are developed. 

Let s¢ be the number of latent roots of 
G; which are greater than or equal to unity. 
For any matrix U;? leaving G; Grammian, 
the minimum rank of G; is shown to be 
greater than or equal to s:. If U2* has as its 
jth element the multiple correlation of 
variable j with all other variables, then the 
resulting G: will have minimum rank equal 
to or greater than e:. If U;? has as its jth 
element the highest zero order correlation 
with the other variables, then the resulting 
G; will have minimum rank equal to or 
greater than s;. The three lower bounds 
for G can be ordered as follows: r2s2.28;3 
2s. In practice 83; will generally be the 
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simplest lower bound to compute. B. J. 
Winer, University of North Carolina. 


Hotelling, Harold, “New Light on the Cor- 
relation Coefficient and its Transforms,” 
Journal of the Royal Statistical Society, 
Series B, 15 (1953), 193-232. 

This paper presents parts of the deriva- 
tion of the distributior of r in a rew form. 
A method of obtaining the probabilities 
associated with the elementary case of the 
population correlation p being zero, for 
large or for small samples is given without 
the use of tables other than of logarithms. 
For px0 the slowly convergent series of 
incomplete beta functions for the prob- 
ability integral given by Pearson is re- 
placed by a rapidly convergent series of 
such functions. 

The moments of r about p and the mo- 
ments of z are calculated by a new method. 
This paper also examines the possibility 
of improvement over the use of z. Certain 
points in the mathematical theory of cor- 
relation coefficients are simplified to make 
more feasible their inclusion in future 
courses and textbooks. This author uses n 
for degrees of freedom, thus several of the 
derivations and formulas associated with 
the correlation distributions appear slightly 
simpler than in terms of sample number. 
Crype Y. Kramer, Virginia Polytechnic 
Institute. 


Bryon, A. Hughes, “Methods for Analyzing 
and Interpreting Physical Measurements 
of Groups of Children,” American Journal 
of Public Health, 44 (1954), 766-74. 


The article points out how three sta- 
tistical techniques can be valuable tools 
in analytical surveys of physical measure- 
ments of groups of school children. First, 
analysis of covariance should be used to 
adjust for variables (e.g. age, height) which 
can be measured but not often controlled 
in sample selection. The second tool dis- 
cussed involves consideration of statistical 
methods developed in connection with bio- 
assay problems. The author indicates how 
these methods are applicable in describing 
the degree of sexual maturation of a group 
of children. The third technique is a multi- 
variate mathematical model to be used in 
studying the joint effect of several growth 
factors. In the appendix, the author men- 
tions some nonsampling errors that fre- 
quently are not taken into consideration by 
workers in anthropometric and nutrition 
research. BERNARD G. GreenBeRG, Uni- 
versity of North Carolina. 


911 


King, E. P., “Probability Limits for the 
Average Chart When Process Standards 
Are Unspecified,” Industrial Quality Con- 
trol, 10 (1954), 62-64. 
New control limits for ¥ are presented 
such that 
é fou 


sit 95 
o/Vn 


where m is the number of subgroups, n is 
the subgroup size and t= 1, 2,+- ++, m. 
The control limits proposed are X +CR 


where 
C= km/(4/nd2). 

These limits provide “short run” X 
charts with approximately a constant type 
I error. A graph of C factors is included for 
subgroup sizes or 2, 3, 4, 5, and 10 and for 
sample numbers of 3 through 25. The C 
factors presented are approximate. GERALD 
J. LrspermMan, Stanford University. 


Klein, L. R., and Mooney, H. W., “Negro- 
White Savings Differentials and the Con- 
sumption Function Problem,” Economet- 
rica, 21 (1953), 425-56. 


Data from the 1947, 1948, 1949, and 1950 
Surveys of Consumer Finances form the 
basis for the analyses summarized in this 
paper. Survey schedules classified by re- 
gion, race, and income class suggest: (1) for 
the North, the ratio of savings to «isposable 
income for comparable income classes is 
higher for Negroes than for whites over the 
entire range of income and (2) for the South 
the savings-income ratio is higher for Ne- 
groes than whites in the lower income 
classes while the relative positions are re- 
versed in the higher income classes. The 
Survey data for the North are in line with 
relations suggested by the Consumer Pur- 
chases Study of 1935-36. The analyses sum- 
marized here bear on propositions advanced 
by the present authors and others previ- 
ously concerning the factors “explaining” 
the racial differential in savings behavior 
and the implications of racial differentials 
in the construction of aggregative consump- 
tion functions. 

In an early section mean residuals, by 
racial groups, from regression equations 

“explaining” the savings-income ratio are 
presented. Separate equations for home- 
owners and nonhomeowners were calcu- 
lated from data supplied by an urban sub- 
sample of the 1949 Survey. Data from a 
larger sample of nonfarm, nonbusiness 
spending units served as the basis for an- 
other equation in which the characteristic, 
homeownership, was represented as a sepa- 
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rate variable in the analysis. Measures 
used directly or in construction explanatory 
variables were: disposable income, liquid 
asset holdings, number of persons in spend- 
ing unit, age of spending unit head, and 
lagged disposable income. The residuals 
from regression in each equation were 
averaged for Negroes, whites and others. 
The mean of residuals for Negroes in each 
equation was positive and larger than for 
whites. The results were presented as sug- 
gestive, recognizing that the racial differ- 
ences in mean residuals are not statistically 
significant. 

Another main section of the paper re- 
ports results of variance analyses of mean 
savings-ratio deviations. For each regional- 
racial group, the deviation of each spending 
unit’s savings-income ratio from the mean 
ratio of its income class was calculated. In 
selected analyses, year served also as a sepa- 
rate variable, while in others data for the 
four years were combined. Additional vari- 
ables are then introduced on the basis of 
which the savings-ratio deviations are 
further classified. The mean of the devia- 
tions falling in each cell of the resulting 
multiple cross classifications is the random 
variable analyzed in a factorial design with 
one observation per cell. 

Among the additional variables on which 
classifications for the several variance 
analyses are based are: (1) liquid asset 
holdings, (2) past income change, and (3) 
job security. Significant main effects of the 
first two of the above variables are re- 
ported as well as significant interactions of 
one or more of these variables with race, 
region, and/or disposable income. For sev- 
eral of the tests, data giving rise to signifi- 
cant interactions are reproduced to facili- 
tate interpretation. Finally, reference is 
made to variance analysis results with cer- 
tain other variables, although the results 
are not reported in detail. 

A supplemental device employed in the 
paper is the presentation of percentage dis- 
tributions of spending units with respect 
to the several variables introduced by re- 
gional-racial-income classes. A brief section 
presenting and discussing the implications 
of such a percentage distribution with re- 
spect to credit use appears in a final section 
of the text of the paper. Ivan M. Len, 
University of California. 


Irwin, J. O., “On the Transition Prob- 
abilities Corresponding to Any Accident 
Distribution,” Journal of the Royal Statis- 
tical Society, Series B, 15 (1953), 87-89. 

From any known distribution of acci- 
dents in a fixed exposure time 7, the ex- 


pected number of other accidents sustained 
by a person who has had =z accidents is 
shown to be the ratio of the (x+1)th to the 
ath factorial moment of the distribution. 
The limiting value of this ratio when the 
exposure time tends to zero gives the transi- 
tion probabilities. From the form of the fre- 
quency distribution the transition prob- 
abilities are derived. G. L. Enaerr, 
Virginia Polytechnic Institute. 


Lukacs, Eugene, “On Strongly Continuous 
Stochastic Processes,” Sankhyd, 13, Part 3 
(1954), 219-28. 

The first theorem is concerned with the 
normality of increments of a strongly con- 
tinuous stochastic process. The proof makes 
use of the ¢, 8 definition of strong conti- 
nuity. Various properties of strongly con- 
tinuous processes are then derived and 
used in the proofs of theorems 2 and 3. 
Theorem 2 states necessary and sufficient 
conditions for a stochastic process to be a 
Wiener process and the sufficiency of the 
conditions is demonstrated. Theorem 3, 
the last theorem of the paper, shows that 
the variance of a strongly continuous proc- 
ess with independent increments need not 
be independent of the time ¢. F. S. Mc- 
Fre y, Virginia Polytechnic Institute. 


Masuyama, Motosaburo, “Mathematical 
Note on Area Sampling,” Sankhyd, 13, Part 
3 (1954), 241-42. 


The author gathers together some results 
of integral geometry due to Poincare, 
Crofton, Blashke, Santalo and his own work 
in order to draw attention to the possibili- 
ties of application to statistical problems of 
area sampling. R. J. Taytor, Virginia 
Polytechnic Institute. 


Matthai, Abraham, “On Selecting Random 
Numbers for Large Scale Sampling,” Sank- 
hyd, 13 (1954), 257-60. 

Random numbers in small scale work 
may be selected without much regard to 
cost considerations; but large scale work 
requires a method which reduces the labor 
of selection. 

In an example cited it is necessary to 
choose a random sample of 800 out of 
70,000. The selection rule is to assign ten 
digits in a random number table to each 
five digits to be selected, and to take the 
last four digits prefixed by the first digit to 
the left less than 7; To illustrate: 


67 6345 0912 gives 50912 
25 2987 4391 gives 24391 


In a second case, one must select a set of 
random numbers less than 2853. A similar 
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method of selection has an expected re- 
jection rate of 4.9%, as compared with 
71.5% in the method of rejecting all four 
digit numbers greater than 2853 and with 
14.4% in the method of dividing by 3000 
and taking remainders. 

The appropriateness of the method is 
established by x? tests. A. N. Pozner, 
Virginia Polytechnic Institute. 

McGill, W. J, “Multivariate Information 
Transmission,” Psychometrika, 19 (1954), 
97-116. 

A model for handling multidimensional 
contingency tables in terms of information 
theory is developed. The method of anal- 
ysis used is analogous in some respects to 
the analysis of chi-square into its compo- 
nents. The sampling distributions of some 
of the statistics that are computed in the 
course of the analysis, particularly those 
concerned with interaction effects, have 
not been tabulated. When such tables be- 
come available, the method developed 
should provide the research worker with a 
useful analytic tool. 

The information transmitted from two in- 
puts, u and », to an output, y, is definedto be 
T(u, 0; y) = T(u; y) +T(o; y) +Alury), 
where T'(u; y) and T (0; y) represent the 
bivariate ion in bit units and 
A(uvy) represents the interaction effect. 
One measure of the interaction effect is 

shown to be 

A(uvy) = T,(u; y)— T(u; y), 
where 7,(u; y) is the average information 
transmitted between u and y for constant 
value v. Extension of this model to the 
general multivariate case involving sev- 
eral orders of interaction is direct. 

A numerical example analyzing output 
information into the equivalent of main ef- 
fects and interaction effects is worked out 
in detail. Significance tests of the interac- 
tions involve the sampling distribution of 
differences between chi-square variables. 
Although the density function of this dis- 
tribution has been derived, tabled values of 
this function are said to be lacking. Ap- 
proximate distributions for other statistics 
involved in multivariate transmission have 
been developed. Certain of these distribu- 
tions useful in testing main effects are given. 
= J. Winer, University of North Caro- 
ana. 


Moran, P. A. P., “The Estimation of the 
Parameters of a Birth and Death Process,” 
Journal of the Statistical Society, 
Series B, 15 (1953), 241-46. 

The estimation of \(A+) is considered 
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for a simple birth and death process. It is 
equivalent to the problem of estimating the 
probabilities of steps to the right and left 
from an observed realization of a random 
walk which has one absorbing boundary 
and which is terminated, if necessary, after 
@ preassigned number of steps. The proper- 
ties of various estimators are considered. 
T.S. Russeu, Virginia Polytechnic Institute. 


Olkin, I., and Roy, S. N., “On Multivariate 
Distribution Theory,” Annals of Mathe- 
matical Statistics, 25 (1954), 329-39. 

The authors developed a matrix method 
of handling a large class of multivariate 
distribution problems including in particu- 
lar those for which the Wishart distribution 
is not available (e.g., the case of a sample 
of N observations from a p-variate normal 
population with p>N— 1). Two techniques 
are used for evaluating the Jacobians of 
certain transformations. The first is ap- 
plied to obtain the joint distribution of the 

coordinates. The second is ap- 
plied to obtain the joint distribution of the 
roots of a determinantal equation. A. E. 
Sarwan, University of North Carolina. 


Podder, K. C., “On the Punched Card 
Method in Smoothing for Age Biasin Census 
Returns,” Sankhyd, 13, Part 3 (1954), 261- 
66. 


A method of smoothing for age bias in 
the preparation of the 1951 census Tables 
of India using Hollerith computing equip- 
ment is described. A table showing the 
method used and a specimen working table 
are given. The actual machine operations 
are described by use of a cycle chart and 
Control Panel wiring diagrams. R. J. 
Taytor, Virginia Polytechnic Institute. 


Psychological Research Wing, “Multiple 
Factor Analysis of Personality Ratings in 
Services Selection Boards,” Sankhyd, 13 
(1953), 17-26. 

The purpose of the investigation reported 
in this paper was the study of the functional 
unities underlying the checking of qualities 
on a rating scale used by Indian Army Se- 
lection Boards for the selection of officer 
candidates, and as such is a good example 
of a complete factor analysis. A sample of 
418 boys, each studied in relation to 21 
qualities, was taken, and the resulting 
data was subjected to a complete centroid 
factor analysis, utilizing Tucker’s criterion 
for the stopping rule. Three factors were 
extracted which were felt coul¢ sufficiently 
explain the inter-correlations. The resulting 
matrix was rotated by the method of ex- 
tended vectors, and the three primary 
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factors were obtained and identified as: 
(1) Intellectual Factor, (2) Social Factor, 
and (3) Dynamic Factor. H. C. Swexrny, 
Virginia Polytechnic Institute. 


Sengupta, J. M., “Some Experiments with 
Different Types of Area Sampling for Winter 
Paddy in Giridih, Bihar: 1945,” Sankhyd, 
13, Part 3 (1954), 235-41. 

The object was to study the relative 
efficiencies of different sampling units, 
with variations in the method of enumera- 
tion, for the estimation of acreage under 
winter paddy. Three different methods, 
using two different types of sampling units, 
were used, being discussed and compared 
with respect to bias, cost and efficiency. 
Their advantages and disadvantages are 
given. Dante. Zaxicn, Virginia Poly- 
technic Institute. 


Sharma, O. C., “Factor Analysis of Tech- 
nical Trades and Educational Examination 
Marks of the Aircraftsmen of the Indian Air 
Force,” Sankhyd, 13 (1953), 27-34. 


A factor analysis was made on the results 
of seven final examinations taken by 75 
aircraftsmen training for the Radio Tele- 
phone Operators and Telegraphists trade 
in the Indian Air Force. Five of these exam- 
inations were ‘trade’ tests, the other two 
being educational tests (mathematics and 
science). The factor analysis was done using 
two different techniques: (a) the Centroid 
Method, and (b) the Method of Principal 
Components. In each case, the analysis was 
carried out to three factors. These three 
factors accounted for 55.7% of the total 
variation in the Centroid Method and 
73.5% of the total variation in the Method 
of Principal Components. A stopping rule 
by Burt was used in each case. The result- 
ing factor matrices were rotated by means 
of the Method of Extended Vectors to 
verify the existence of simple structure. 
Both methods demonstrated the same fac- 
tor pattern. Three group factors were ob- 
tained and identified as: (1) Clerical 
Ability Factor, (2) Number Ability Factor, 
and (3) Technical Skill Factor. H. C. 
Sweeny, Virginia Polytechnic Institute. 


Singer, K., “Application of the Theory of 
Stochastic Processes to the Study of Irre- 
producible Chemical Reactions and Nuclea- 
tion Processes,” Journal of the Royal Sta- 
tistical Society, Series B, 15 (1953), 92-106. 

Let mi, n2, > * * , % symbolically denoted 
by the vector n be the number of the dif- 
ferent molecular species 1, 2,- ++, 7, in a 
reacting system of constant volume. Sup- 
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pose the system is subject to random varia- 
tion and the composition is characterized 
by P(n; t), the probability that the system 
has the composition n at the time ¢. Dif- 
ference-differential equations involving 
probabilities of changing from one com- 
position to another in a given time are de- 
rived and studied as are also equations in- 
volving “first passage” times and “recur- 
rence” times. Several applications to the 
study of chemical reactions are given. 
Paut N. SomMERVILLE, Virginia Polytechnic 
Institute. 


Singh, R. P., and Nagar, D. N., “A Study 
on the Growth of Population in Rajosthan,” 
Sankhyd, 13 (1953), 39-42. 


Some data for the Rajputana states is 
taken from the census reports of the 
period 1901-1941 and studied with regard 
to the number of married females in differ- 
ent age-groups, reproduction according to 
age groups, distribution of married females 
of reproductive age, average number of 
children born, the number survived, sex 
ratio and increase in population. R. L. 
Wine, Virginia Polytechnic Institute. 


“The National Sample Survey: General 
Report No. 1,” Sankhyd 13, Parts 1 and 2 
(1953), 47-218. 


This paper reports on the National Sam- 
ple Survey of India covering the period 
October 1950 to March 1951. The survey 
was conducted to supply reliable statistics 
relating to production, consumption and 
other aspects of economic and social life in 
India. Data were obtained on size of rural 
households, per capita consumer expendi- 
ture in rural areas, expenditure on food, 
expenditure on clothing and head and foot- 
wear, and medical and ceremonial ex- 
penses. Appendix 2, pp. 136-198, contains 
tables reporting on the data collected under 
the above noted general headings. Ap- 
pendix 3, pp. 197-214 contains facsimile 
field schedules. 

The design of the survey is discussed 
and some notes are included relating to 
changes to be made in the second round of 
sampling. Different methods of selecting 
the sampling units were adopted in different 
parts of the country and the probability 
of being included in the sample differed 
from region to region. Sampling units were 
selected in two stages: first the villages were 
selected after suitable stratification; within 
each sample village all or a subsample of 
80 households, whichever was less, were 
stratified into agricultural and nonagricul- 
tural classes and sample households were 
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then selected at random from each of these 
strata. RatpH A. Brapuiey, Virginia Poly- 
technic Institute. 


Whittle, P., “The Analysis of Multiple 
Stationary Time Series,” Journal of the 
Royal Statistical Society, Series B, 15 (1953), 
125-39. 

The author extends his earlier methods 
on the application of the least square prin- 
ciple to the analysis of a single stationary 
time series to its application in the analysis 
of a multiple series. For a purely nonde- 
terministic stationary multiple process the 
least square estimation equations are de- 
rived. For a normal process the asymptotic 
covariances of the parameter estimates are 
calculated. The methods developed are 
illustrated by the testing of a sunspot 
model. G. L. Epeert, Virginia Polytechnic 
Institute. 


Wold, Herman O. A., “Causality and Econ- 
ometrics,” Econometrica, 22 (1954), 162- 
77. 


Following a few general remarks on the 
concept of causality, definitions are pro- 
posed which are considered useful in prob- 
lems involving relations between variables. 
Attention is given both to the nonstatis- 
tical (exact relations) and statistical points 
of view, but the discussion centers pri- 
marily on the latter. Consider the general 
relation y=f (21,°°*, 2a) +z, where 2 
represents the disturbance term. In a con- 
trolled experiment the z’s represent control 
variables and y the effect variable. With 
proper design and analysis this may be in- 
terpreted as a causal relation. In the case of 
nopexperimental observations (for example, 
econometric analysis of time series data) 
a relation like that above is defined as causal 
if “it is theoretically permissible to regard 
the variables as involved in a fictive con- 
trolled experiment with 2,,°-++, 2 for 
cause variables and y for effect variable.” 
Proper specification and analysis then per- 
mits the causal interpretation. Within 
the framework of the definition, proposed a 
few simple, illustrative economic models 
are discussed from the econometric point of 
view. The causal interpretation in the il- 
lustrative relations is discussed primarily 
within the framework of recursive systems. 
Employing the concept of a link set the 
author extends the recursive model to in- 
clude the case where one or more effect 
variables in a given “link” (endogenous 
variables at point #) are jointly causally 
explained by variables in “previous links” 
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(lagged endogenous and lagged or current 
exogenous variables). Ivan M. Lez, 
University of California. 


Woolsey, Theodore W., “On the Use of 
Sampling in the Field of Public Health,” 
American Journal of Public Health, 44 (1954), 
719-40. 


The American Public Health Association 
Statistics Section had its Committee on 
Sampling Techniques prepare this valuable 
article on the uses of sampling for public 
health workers. The discussion is broad 
enough to include applications of sampling 
in all fields. The manuscript describes 
when and how sampling may be put to ad- 
vantage, its reliability, and also those 
situations for which sampling is not a help. 
Probability sampling is discussed and sev- 
eral illuminating illustrations are presented. 
In the appendix, a selected bibliography 
on sampling is given with a list of recent 
references in which probability sampling 
was used to solve public health problems. 
Bernarp G. GREENBERG, University of 
North Carolina. 


Yates, F., and Grundy, P. M., “Selection 
Without Replacement from Within Strata 
with Probability Proportional to Size,” 
Journal of the Royal Statistical Society, 
Series B, 15 (1953), 253-69. 

In sampling without replacement with 
probability proportional to size, the usual 
formula for estimation of a stratum variate 
by weighting the units in inversion propor- 
tion to the size of the units is biased. Nu- 
merical examples are given to show that the 
bias is small. A formula for unbiased esti- 
mates is given, which, however, for samples 
of size greater than two involves consider- 
able labor. 

The bias in the ordinary formula for the 
estimation of error is investigated and also 
is found to be small. An unbiased estimate 
of error is given which is shown to be more 
efficient that that given by Horvitz and 
Thompson. 

A method of revising size measures so 
that, with the usual method of selection, the 
true total probabilities of selection are 
proportional to the origina: size measures is 
given for samples of size 2. 

Criticism is given of the practice of se- 
lection of successive members of a sample 
with sets of probabilities chosen solely so 
that the total probabilities shall be propor- 
tional to the original size measures. Pau. 
N. Somervi.yte, Virginia Polutechnic In- 
stitute. 
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Sexual Behavior in the Human Female. Alfred C. Kinsey, Wardell B. Pomeroy, 
Clyde E. Martin, and Paul H. Gebhard. Philadelphia and London: W. B. Saun- 
ders Company, 1953. Pp. 842. $8.00. 


See review article by Dorothy 8. Brady, on pages 696-705. 


Statistical Method in Industrial Production. Thirteen Papers plus Foreword by 
A. Bradford Hill given at a Conference held by the Industrial Applications Sec- 
tion of the Royal Statistical Society in Sheffield in 1950. London: 1951. Pp. iv, 
89. 7s 6d. 


Luoyp A. KNow.er, State University of Iowa 


ARLY Experiences of Statistical Quality Control in a Pottery,” by Arthur 

G. Ellis, is a case history of two applications in the manufacture of pottery 
—-pint weight of slip and dry modulus of rupture. The tremendous benefits 
which can result in the industry from the use of statistical quality control 
chart techniques is indicated. Also, the importance of having missionary 
work at or about the foreman level is noted. 

“Applications of Control Charts to Brick Manufacture,” by T. G. W. 
Boxall, describes an application of quality control charts in an industry in 
which it is impossible to make big changes in the source of raw material and 
which is concerned with the manufacture of a cheap, mass-produced article. 
Through the measurement of but six bricks of about 40,000 burnt in a kiln, 
it was discovered that a simple modification to the feeding mechanism of the 
presses would result in bricks which would easily meet the specification 
limits set by the British Standards Institution. Another study indicated 
some minor changes desirable in anticipation of shrinkage characteristics. 
Following these two studies, the control chart techniques have been ex- 
panded so as to consider crushing strength, weight, and absorption of bricks, 
as well as the quality of bricks made in different machines and burnt in dif- 
ferent kilns. Also, the technique has been expanded to compare the efficiency 
of various brickmaking operations and firing methods. 

“Contributions of Statistics to Problems of Chocolate Manufacture,” by 
B. Moorhouse, describes a study of the manufacture of moulded chocolate 
block containing a “centre” which, on completion, was showing more varia- 
tion in weight than was desired. It was noted that among three main lines to 
bring the manufacturing process under control were efforts to reduce (1) 
variation between moulds, (2) variation between cake positions within the 
mould, and (3) variance between day-to-day runs. It is pointed out that the 
best way to acquire furrther knowledge of control chart work is to make as 
many applications as possible, and that conclusions drawn from experiments 
carry more weight when the data are examined by the statistical technique. 

“Productivity Measurement in the U.S.A.,” by H. Ingham, “analyses the 
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differences in Great Britain and the U.S.A. in attitudes to productivity, and 
urges that (Great Britain) should energetically engage in certain specific 
statistical investigations of productivity.” Among the points made are the 
following: (1) Productivity seems to have become a national myth in the 
U.S.A.; in fact, it is pointed out that business men and trade unionists would 
become seriously concerned if the figures showed that productivity had not 
increased at a rate of about 3 per cent per annum. (2) The people in U.S.A. 
are interested in “man-hours required per unit of output.” (3) In U.S.A. 
there is an overwhelming emphasis on the “down to earth” type of person, 
while the reverse seems to be true in Great Britain. 

“Costing of Continuous Processes,” by Philip Lyle, illustrates application 
of statistical methods to the determination of the average effect upon the 
costs of a factory, department, or process of a change in output. It is shown 
that a measure of the total variation amongst a series of weekly cost figures 
can be divided into (1) the amount of this variation which can be ascribed to 
changes in output, and (2) the amount due to unknown factors or “error.” 
The knowledge of the error component enables one to predict the costs based 
upon various outputs. Marginal cost is discussed. It is shown that the mar- 
ginal concept only applies for short-run variations, and that long-run varia- 
tions take place in finite steps for which “arc cost” must be used in place of 
“marginal cost.” 

“Graphical Analysis of Variations as a Production Department Tool,” by 
E. A. G. Knowles and C. Roseman, shows, by means of an example, how a 
graphical analysis of the results of a complicated experiment can be carried 
out by members of a production department familiar with control charts. 
In the example, all effects show up on the control charts, and they are made 
available immediately to all concerned. It is indicated that final tests of sig- 
nificance, such as analysis of variance, can be performed by the statistical 
section of an organization. A comparison of the results of the graphical an- 
alysis are made with those of analysis of variance. 

“Comparative Tests in a Single Laboratory,” by W. J. Youden, makes the 
point that “It is a fact of experience that a set of measurements made by 
different operators at different times or in different localities is subject to 
greater variation than a set of measurements made by one operator using the 
same apparatus on the same day. The data from an experiment with thermom- 
eters have been used to show that even as simple an operation as reading 
a linear scale cannot be duplicated nearly as well after a time lapse as on the 
same day. The paper emphasizes the necessity of including in the error of the 
test all sources of variation which in fact operate on the measurements and 
shows how a change in the design of the test may reduce the error.” 

“The Statistical Approach to Time Study,” by D. J. Desmond, “first 
gives a brief historical survey of the development of Time Study and pro- 
ceeds to show how the technique of rating has become a part of modern time 
study in many industries. The methods of selecting the normal time for a 

job are then discussed, and it is shown that until recently there was no ob- 
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jective way of determining the quality of any particular time study or meas- 
uring the accuracy with which any time study observer is working. 

“A new method of analysis is then developed, based on regression analysis, 
which gives an objective determination of the normal time of a job in terms 
of the recorded times and subjective estimates of the observer. This will give 
an estimate of the unknown normal time, and the precision of this estimate 
can be calculated and compared with the results obtained by other observers 
studying the same job. The various defects in a study can be calculated in 
terms of three different parameters which establish the standard of quality of 
the studies of the observer. Plotting these statistics on control charts enables 
the observer to determine, at a glance, whether he is maintaining the quality 
of his work, and to see if he is achieving any improvement. A simple graph- 
ical method is described which enables him to estimate all the characteristics 
of his study in less time than he usually takes merely to determine his nor- 
mal time. 

“The method is then developed, by the analysis of variance technique, 
to enable any number of studies to be combined. This can lead to the estab- 
lishment of a standard of quality for a group of observers, and the signifi- 
cance of the differences between individual observers can be examined. 
These differences are illustrated by the results of an experiment carried out 
on the floor of an assembly shop.” 

“Problems of Even Flow in Production,” by E. D. Van Rest, deals with 
what are sometimes called “congestion” problems, which arise in providing a 
service when the need arises at random intervals of time. Such problems are 
frequent in industry; for example, one operator tending several machines or 
one machine tended by two or more operators. In fact, the problem is im- 
portant in planning an even flow of work through a production process be- 
cause the various accidents which are liable to occur delay progress. The 
particular problem considered is typified by the spinning frame of a cotton 
mill where one person looks after a large number of spindles. The thread of 
each spindle may occasionally break and need repair. The type of informa- 
tion required is described, as well as the use to be made of it. Similar prob- 
lems have received some attention in operations research, as well as in qual- 
ity control. 

“Statistics Applied to Assembly Process,” by G. A. Barnard, considers 
the tolerance limits of an assembled article as related to those of the com- 
ponents. In particular, consideration is given to the following types of as- 
sembly process: (1) random or interchangeable, (2) semi-random, (3) simple 
selective, and (4) multiple selective. The need for a mathematically trained 
statistician in any fair-sized plant, to form « link between the production 
and cost departments, is observed. 

In “The Cost of Inspection,” by F. J. Anscombe, “assessment of the total 
cost of an inspection procedure is considered, taking into account the cost of 
decisions made on the basis of the inspection. Simple hypothetical process 
curves, inspection cost curves, and decision loss curves, are described. A 
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numerical example of rectifying inspection is considered in some detail, and 
the relevance of the Dodge-Romig concepts of AOQL and lot tolerance to 
such problems is discussed.” 

“Multiple Sampling in Theory and Practice,” by J. H. Enters and H. C. 
Hamaker, selects “from among the great variety conceivable such multiple 
plans .. . as can be presented to inspectors in the form of very simple in- 
structions, in which use is made of the method of scoring proposed by Bar- 
nard for sequential sampling. For plans of this type the operating character- 
istics and the average sample size are computed assuming Poisson prob- 
abilities. A measure of efficiency, the inverse efficiency, is then obtained by 
dividing the average sample size by the sample size of a single sampling plan 
possessing practically an identical operating characteristic. The search for 
these equivalent single sampling plans is greatly facilitated by specifying the 
operating characteristics by their point of control, po, and their relative 
slope, ho, defined by 


P(po) = 3 
and 
dP 
ho = - (55). Pp = Po, 


where F(po) is the probability of accepting a lot in which the proportion of 
defectives is po. On this basis a number of multiple sampling plans are in- 
vestigated. Their efficiency is compared with that of double and sequential 
sampling, and the influence of the crudeness of the steps and of curtailing is 
systematically studied. The actual number of observations is a stochastic 
variable the distribution of which is separately considered. In a final section 
the experience gained in applying multiple sampling in a factory over a period 
of about three years is briefly recorded.” 

“Sequential Analysis of Machine Performance,” by B. H. P. Rivett, con- 
siders situations where the variation of a dimension of the product of a 
machine is sufficiently small compared with the tolerance, so that the ma- 
chine setting can have a zone within which it is free to move without de- 
fectives being produced. A method is given for determining (assuming cer- 
tain risks) whether the setting of a machine is in this zone. The method can 
be adapted to « lot-by-lot inspection scheme for acceptance of the product 
with reference to the mean dimension. 


Research Methods in the Behavioral Sciences. Leon Festinger and Daniel Katz, 
editors. New York: The Dryden Press, 1953. Pp. xi, 660. $5.90. 


Danie O. Price, University of North Caroli .a 


HIs very excellent volume might more appropriately have been titled 
Research Methods in Social Psychology, for the actual title seems merely 
to capitalize on a new and popular term. As soon as the reader realizes that 
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the authors are not trying to make social psychology synonymous with the 
behavioral sciences, resentment dies out and the real merits of the book are 
more clearly seen. 

Following a short introduction on The Interdependence of Social-Psy- 
chological Theory and Methods: A Brief Overview (Theodore M. Newcomb), 
the volume is divided into five parts: Research Settings, Procedures for 
Sampling, Methods of Data Collection, The Analysis of Data, and The Ap- 
plication of Research Findings. 

Part I, Research Settings, deals with The Sample Survey, Field Studies, 
Experiments in Field Settings, and Laboratory Experiments. These chap- 
ters, each by a different author, are well integrated. 

Part II, Procedures for Sampling, has only one chapter, Selection of the 
Sample by Leslie Kish. In the reviewer’s opinion this is one of the best brief 
(65 pages) treatments of sampling that a research worker can find in the 
literature. It is sound and practical, even including a brief section on non- 
sampling errors. 

The Methods of Data Collection (Part III) includes Problems of Objective 
Observation; The Use of Documents, Records, Census Materials, and In- 
dices; The Collection of Data by Interviewing; and the Observation of Group 
Behavior. Had the book been written under the title which it now carries, 
we might have expected a chapter on case studies of individuals. The chapter 
on Problems of Objective Observation (Helen Peak) includes, among other 
things, comments on item analysis, comparisons of Thurstone, Likert, and 
Guttman scales, and discussions of validity and reliability. The chapter on 
The Collection of Data by Interviewing (Charles F. Cannell and Robert L. 
Kahn) includes not only material on the psychological basis of the interview 
and principles of interviewing but also material on questionnaire construc- 
tion, training of interviewers, and a detailed sample interview. The chapter 
on Observation of Group Behavior (Roger W. Heyns and Alvin F. Zander) 
deals with “two principle types of observation instruments: category systems 
and rating scales,” and deals only briefly with observational situations. 

Despite the generally high quality of this volume, Part IV, The Analysis 
of Data, is probably the meatiest section of the book. The chapter on An- 
alysis of Qualitative Material (Dorwin P. Cartwright) is an excellent pres- 
entation of how to develop and use a plan of content analysis or coding 
(the terms are used interchangeably). The Theory and Methods of Social 
Measurement (Clyde H. Coombs) is a chapter that gets at the very roots of 
social measurement, though it is so tightly written as to be quite heavy going 
in places. (Coombs uses the term “qualitative” in a different sense than does 
Cartwright in the preceding chapter.) Keith Smith’s chapter on Distribution- 
free Statistical Methods and the Concept of Power Efficiency is, among other 
things, an excellent collection and presentation of distribution-free statistical 
methods. 

The last part and chapter, The Utilization of Social Science (Rensis Likert 
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and Ronald Lippitt), is a good discussion of the procedures, policies, and 
problems involved in the application of research findings. 

All chapters include good bibliographies and lots of live, illustrative ma- 
terial. The book will, quite properly, find a wide market as a text and refer- 
ence book in research methods. 


Income and Wealth: Series III. Milton Gilbert, editor. Papers by Milton Gilbert, 
Shigeto Tsuru and Kazushi Ohkawa, Richard Stone, and Kurt Hansen, Tibor 
Barna, 8. Herbert Frankel, Frederic Benham, V. K. R. V. Rao, Daniel Creamer, 
Ingvar Ohlsson, and Francois Perroux, Georges Guilbaud, Jacques Mayer, Jean 
Albert, and Marcel Malissen. Cambridge: Bowes and Bowes, 1951. Pp. xiii, 261. 
Price 35s. 


Ear R. Roupn, University of California (Berkeley) 


ie volume contains ten papers delivered at the meeting of the Inter- 
national Association for Research in Income and Wealth held at Royau- 
mont (France) in 1951. Two of the papers provide data on national income 
over a long period—for France since 1780 and for Japan since 1878. The de- 
tailed information these papers contain should be of especial interest to eco- 
nomic historians. Of the remaining papers, four apply social accounting con- 
cepts to underdeveloped areas, three deal with conceptional and theoretical 
topics, and one, likely to be of most interest to statisticians, is an analysis 
of the problem of the reliability of national income data. 

Milton Gilbert maintains, persuasively in my judgment, that the reli- 
ability of a national income component can be learned only by reviewing the 
sources of the data and the methods of estimation employed. Meaningful 
numerical measures of reliability cannot be provided and attempts to do so 
might easily be misleading. Independent estimates do not always increase 
reliability because in many cases one source of data is known to be definitely 
superior to any other. The great differences in the quality of the data out of 
which national income statistics are built means in fact that national income 
estimates of different countries are not truly comparable, even if the con- 
ceptual differences are unimportant. This observation lends added weight 
to the opinion of those who were dubious of the value of an international 
agreement on basic income concepts, especially since it comes from one who 
was an important participant in those conferences. 

The paper of Ingvar Ohlsson is devoted to that much discussed topic, 
the treatment of government activities in social accounting. There is little 
new information for those who have followed that literature, unless it is the 
resurrection of the plea for more attention to the purposes of constructing 
national accounts. If the plea were taken seriously, the construction of social 
accounts might be indefinitely postponed while debates were carried on as to 
what purposes are important. Presumably the purpose of intellectual work 
is to tell the truth as best one can regardless of how congenial or uncongenial 
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the results may be. Pragmatism is also the tone of a longish paper by Richard 
Stone and Kurt Hansen on inter-country comparisons. The authors succeed 
in arriving at definite conclusions, though one might wish that they would 
pause once in a while to inform their readers why the tests they select have 
relevance—why, for example, the effect on relative prices is a proper basis 
for distinguishing among taxes. It is hard to think of any government action 
that does not in fact affect relative prices. 

Tibor Barna extends relativism to economic theory; apparently we must 
have a different set of economic theories for every country. I was surprised to 
learn that in France, in contrast to Great Britain, it may be proper to treat 
the repayment of public debt as a part of national income because in France 
such repayments induce increases in private expenditures whereas in Britain 
they do not. Mr. Barna would, I think, have some difficulty in finding a con- 
sensus among British economists that monetary policy is completely un- 
workable in their country. But the compiler of national income statistics 
need not venture into the difficult questions of monetary policy. The repay- 
ment of any debt, including a public debt, is an exchange of assets—not an 
income transaction—in France, Great Britain, or India. Conceptual dis- 
tinctions need be kept apart from the determinants of behavior. 

Of the four papers concerned with underdeveloped areas, Mr. Frankel’s 
is mainly an elaboration of the view that it is wrong to suppose that an in- 
crease in real national income can be assumed to increase welfare. With this 
position one may agree or disagree, but with his insistence that the mere cal- 
culation of national income involves an implicit acceptance of certain welfare 
notions, I at least cannot agree. Frederick Benham in his Comments provides 
some sobering analysis of Frankel’s rather strongly, but not always clearly, 
stated remarks. V. K. R. V. Rao tackles the difficult problem of international 
comparisons of real income. refuting the common assumption that the real 
incomes of less developed societies are comparatively understated because of 
their greater amount of household industry. One may endorse his recom- 
mendation that, with the present state of knowledge, the United Nations 
cease setting out figures purporting to be international income comparisons. 
The reader might find his remarks more convincing if he had avoided basing 
some conclusions on his own personal value judgments, such as that the 
real national income of the United States is overstated because we include 
the activities of the liquor business in the totals. Mr. Creamer in his paper 
cites chapter and verse for the advantages of having national income data 
for an underdeveloped area—Puerto Rico. He tempers his remarks with the 
warning that there are other and, in some cases, better ways to spend intel- 
lectual resources devoted to the study of underdeveloped areas than in 
estimating national income. His paper is informative. 

Papers delivered at a conference rarely make a satisfactory book. Care- 
ful editing would have made this a smaller and perhaps a better one. It is 
fervently to be hoped that in any future volume of this series, an index will 
be provided. 
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Consumer Attitudes and Demand, 1950-1952. George Katona and Eva Mueller. 
University of Michigan: Institute for Social Research, Survey Research Center 
Publication No. 12, 1953." Pp. v, 119. Paper $1.50; cloth $2.00. 


Water D. Fisuer, Kansas State College 


HIS empirical study reports on the buying behavior of United States 

families in a period of prosperity immediately following inflation. In 
many ways it is “an extension of research into consumer attitudes, expecta- 
tions, and intentions initiated in the Surveys of Consumer Finances” (p. tit). 
It is a pioneering work, using relatively new concepts having future promise, 
and at the same time a workmanlike job. Although the book is thin, the 
material inside is meaty. 

The basic hypothesis tested is that the amount of consumer spending on 
durable goods is influenced by certain “attitudinal” variables, including per- 
ceptions, expectations, and opinions as expressed by consumers themselves. 
These concepts, developed in some detail in Katona’s earlier book, Psycho- 
logical Analysis of Economic Behavior, are reviewed briefly in a theoretical 
chapter. Ample evidence is produced to establish this hypothesis, at least in 
the short run, although more attention is given to opinions of buying condi- 
tions than to actual purchases. Factors having most influence on these opin- 
ions are indicated to be consumers’ perceptions of price movements in the 
recent past, and their evaluations of the general economic outlook in the near 
future. 

Some of the most interesting findings concern prices and inflation. Con- 
sumers were definitely conscious of and resented the price increases of 1950 
and 1951, and these attitudes affected adversely their willingness to buy. 
However, they did not fear inflation to the extent of making any appreciable 
shifts in the form of their savings from bonds to stocks, nor did they fear 
for the soundness of their money. 

Findings are based primarily on data from four successive interview-sur- 
veys, each a sample of approximately 1000 families representing all private 
dwelling units in the United States, taken about six months apart with the 
first one in June, 1951. Each sample was independently drawn by a process 
of four-stage area probability sampling, using the controlled selection fea- 
tures developed by the Sampling Section of the Survey Research Center. An 
appendix table contains convincing evidence that the four samples were 
nearly identical in a variety of demographic characteristics such as size and 
occupations of the families. The interviews, approximately an hour long, 
contained fixed questions with no latitude given to the interviewer regarding 
formulation and sequence, except for occasional probes of indefinite an- 
swers, 

The sample data are presented throughout in the form of percentages of 
responses, or of respondents, having certain attributes. Two major techniques 
are used: (1) answers to identical questions are tabulated separately for each 
time point, trends being inferred by making comparisons between findings 
at different times; and (2) answers to different questions—usually two at a 
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time—are presented in contingency tables with all time periods pooled to- 
gether, and relationships inferred between two factors at a time by noting 
differences in certain percentages. Although no reference to statistical signifi- 
cance is made in the text, the reader is able to make his own judgments by 
use of an excellent table of sampling errors in the appendix; and, in fact, 
most of the findings claimed are statistically significant by conventional 
standards. 

At times the use of coefficients of association or similar measures from the 
theory of attributes would have aided the reader in digesting the many 
arrays of percentages displayed. In some sections more use of joint relation- 
ships involving two or more independent variables would have been inter- 
esting and also more indicative of the relative importance of the various 
factors. 

Relationship between actual purchases and the other variables could have 
been claimed more effectively from the time-series comparisons alone. The 
procedure followed of seeking to establish a relationship between reported 
purchases “during the last 12 months” and the opinion “this is a good (or 
bad) time to buy” is not convincing—for reasons which the authors recog- 
nize: first, the “last 12 months” is rather a long time in the context of this 
study; second, respondents would tend to rationalize recent purchases, es- 
pecially since in the interviews the opinion expressed followed immediately 
the statement of purchases. 

The authors advance also a second more ambitious hypothesis: that the 
use of attitudinal variables significantly improves knowledge and predicting 
ability over what can be done by using non-attitudinal variables alone. “It 
is claimed here that the use of functional relationships between consumer 
attitudes (as well as traditional financial variables) and spending will in- 
crease the probability of correct predictions” (p. 58). The present volume 
alone does not establish this claim, and does not seem designed to do so. No 
comparisons are made between attitudinal and non-attitudinal variables as 
predictors, nor between non-attitudinal predictors as used alone and as used 
along with attitudinal ones. Moreover, no empirical evidence is presented 
that would contradict a hypothesis that all attitudinal variables are ulti- 
mately dependent on or caused by non-attitudinal ones. The possibility of 
admitting such a view seems to be entertained by the authors when they 
state: “Changes in attitudes are rarely fortuitous. They are dependent on 
developments which induce people to restructure their thinking” (p. 57). 

It may well be found useful, in the formulation and testing of models of 
economic behavior, to introduce variables that cannot be classified clearly 
as “attitudinal” or “non-attitudinal.” Indeed, one of the most significant 
variables found in this study—the frequency of the opinion that prices went 
up in the recent past—conceptually “does not represent an attitude toward 
economic matters in the strictest sense of the term, registering instead a per- 
ception which might influence attitudes” (p. 46). Further research will de- 
termine whether other such borderline cases exist, and will also clarify the 
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nature of the causal interaction between the variables—psychological and 
otherwise—that enter into economic fluctuations and development. 

The real contribution of this book lies in its emphasis on matters that have 
been somewhat neglected in economic analysis—especially the importance 
of consumer demand in business cycles, and the role of psychological vari- 
ables; and also in its demonstration of the feasibility of represer.ting such 
variables by answers to questions in interview-surveys. Moreover, it helps 
fill a great current need for more empirical work and more interdisciplinary 
research in the social sciences. 


Cardano, The Gambling Scholar. Oystein Ore. Princeton, New Jersey: Princeton 
University Press, 1953. Pp. xiv, 249. $4.00. 


Meyer Dwass, Northwestern University 


‘VHIs is a story of scholarship in the fascinating and fantastic Renaissance. 
The scholar, Cardano, is presented in a light of sympathy and under- 
standing, which for him, is an aura distinctly new. There is, for instance, 
the matter of Tartaglia and the cubic. E. T. Bell gives us a typical report 
(Development of Mathematics, 2nd edition, McGraw-Hill, p. 117): “Cardan 
. .. Whose name ornaments the solution of the cubic in every intermediate 
textbook on algebra, obtained the solution from Tartaglia under promise of 
secrecy and published it as his own in the Ars Magna (1545).” We get from 
Ore a distinctly new slant on what has become an old party line: Sometime 
before 1515, an Italian professor, Scipione del Ferro, invented a method 
to solve the equation z*'+az=b. As was then the custom, the result was 
buried in secrecy. A favorite pastime of Renaissance academicians was a 
type of quiz contest with a heavy jackpot as well as points toward academic 
advancement for the winner. Hence, results such as Ferro’s were not as a 
rule published, but were kept as secret weapons for these public disputes. It 
was in just such a public dispute, years later, in 1535, that Tartaglia redis- 
covered the method. Cardano, a physician of universal interests, was writing 
what he hoped would be the complete algebra of his day. Cardano succeeded 
in wrangling the result from Tartaglia, but only under the frustrating oath 
that it never be disclosed or published. This was in 1535. In the ten years 
that foiiowed Tartaglia’s rediscovery, still others rediscovered the method. 
Moreover, Cardano and a pupil succeeded in finding methods for dealing 
with more general forms of the cubic. What is more important, Cardano un- 
earthed Ferro’s original result and priority. Thus, Cardano felt himself re- 
lieved of oath and duty. His Ars Magna, published in 1545, contained the 
method, a statement that it was given to him by Tartaglia, and also a state- 
ment allocating priority to del Ferro. Cardano stands vindicated. 

What should be of greater interest to statisticians is Cardano’s virtually 
overlooked role in the early history of probability. Ore promotes the thesis 
that the father was not Pascal but Cardano. Cardano was a passionate 
gambler and it was inevitable that his mathematical interests should lead 
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him to theoretical speculations on the laws of chance. He had the miserable 
habit, however, of writing down speculations on little scraps of paper, jotting 
down improvements, revisions, and random thoughts as they came to him. 
Eventually the scraps were published with insufficient rewriting or editing— 
a collection of facts and ridiculousness. Ore dissects these hitherto undis- 
sected writings in a triple role of mathematician, classicist, and detective. 
Among his conclusions are the following: Cardano understood and formu- 
lated the definition of probability of an event in terms of equally likely cases. 
He used this to compute correctly many of the probabilities for dice and 
other games. He also succeeded in computing many probabilities incorrectly. 
His main device in the latter would in modern terms read something like, 
P(Aor BorC or: - - )=P(A)+P(B)+P(C)+ - - - . However, he fully real- 
ized that this was an approximation which was often quite unsatisfactory. 
He also evolved the “power law,” that the probability of n occurrences of an 
event A in n independent trials is P(A). 

In this review (as in the book) are emphasized two highlights of Cardano’s 
life—the cubic and probability. But Cardano lived, loved, invented, gambled. 
suffered, and died. Ore describes all this in a crisp and readable style. This 
is a book I recommend. 


Gamma Globulin in the Prophylaxis of Poliomyelitis: An evaluation of the 
efficacy of gamma globulin in the prophylaxis of paralytic poliomyelitis as used 
in the United States 1953. Public Health Monograph no. 20. Report of the Na- 
tional Advisory Committee for the Evaluation of Gamma Globulin in the Pro- 
phylaxis of Poliomyelitis, Public Health Service Publication No. 358, U. 8. 
Department of Health, Education, and Welfare. United States Government 
Printing Office, Washington: 1954. For sale by the Superintendent of Docu- 
ments, U. 8. Government Printing Office, Washington 25, D. C.—$1.25, pp. 
vi+178. 


HE 1953 study reported here is not to be confused with the 1954 vaccine 
trials for the control of poliomyelitis. 

Experiments by Hammon and associates based on 12 gamma-globulin- 
inoculated cases and 16 gelatin-inoculated cases suggested that gamma 
globulin might be useful in modifying the severity of poliomyelitis, or even 
in preventing it (p. 3). 

A national study during 1953 was conducted by the Communicable Dis- 
ease Center of the Public Health Service, planned and guided by a National 
Advisory Committee (p. 1); during the 1953 summer 235,000 children were 
inoculated in cities and communities where there were outbreaks of polio- 
myelitis. It is said (p. 1) that “the records of cases collected in this study 
have a greater accuracy, consistency, and validity than any that have been 
collected on such an extensive scale heretofore.” “The committee recognized 
that it would be very difficult to conduct rigidly controlled studies in the 
United States during 1953” (p. 3). “. . . the committee recommended four 
approaches to the problem: 
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“1. Descriptive epidemiologic studies for each of the areas where mass use 
of gamma globulin was employed. 

“2. A comparison of the severity of paralysis of patients developing the 
disease immediately before mass use with the severity of those acquiring the 
disease after receiving gamma globulin. 

“3. Study of the severity of paralysis among multiple-case households;. . . 

“4. The documentation of administrative aspects of the distribution of 
gamma globulin” (p. 3). 

Appendix B gives reports of epidemiological investigations in thirteen 
mass inoculation areas, 1953. An evaluation was based on: (1) asymmetry 
of epidemic curves; (2) shift in age distribution to older groups not receiving 
gamma globulin, this shift beginning after mass distribution; (3) modification 
in the duration of epidemics; and (4) differential attack rates. This evaluation 
turned out to be inconclusive for various technical reasons (pp. 10-18), and in 
any case was not very encouraging. 

The study of severity of paralysis in inoculated and uninoculated patients 
concluded (p. 21) “... its preventive effect in community prophylaxis as 
practiced during 1953 has not been demonstrated. Also, no modification of 
the severity of paralysis by gamma globulin was shown. Nevertheless, the 
committee cannot say that the use of gamma globulin by mass inoculation 
produced no effect.” The need for a more carefully controlled experiment is 
described. 

The multiple-case household study was regarded as adequate for reliable 
conclusions (p. 85): “They indicate that with the preparations employed and 
in the dosages used, the administration of gamma globulin to familial asso- 
ciates of patients with poliomyelitis had no significant influence on: 

“1. The severity of paralysis developing in subsequent cases. 

“2. The proportion of nonparalytic poliomyelitis among the subsequent 
cases who received gamma globulin before onset. 

“3. The classical pattern of familial aggregation of cases in the country at 
large.” 

The study of administrative problems may be of value in future work. 

Dr. Hammon comments on the study, appropriately reminding the 
reader of numerous limitations, including the lack of suitable controls. 
He feels that the modification issue has not been settled, and that the gamma 
globulin was given too late, but states that the “agent has an extremely 
limited application in the field of preventive medicine and will not produce 
dramatic results in general use” (p. 90). 

F.M. 





RANDOM DIGITS (20,876-21,875) 


With this issue, the Journal will discontinue publication of random digits. The complete set from 
which these have been taken is now being published for The Rand Corporation by The Free Press 
(Glencoe, Illinois) under the title A Million Random Digits. 


89024 32054 46997 92652 28363 
93573 95502 33790 92973 27766 
96035 18795 48080 59666 30241 
19264 29229 61369 08309 39383 
69801 37145 79189 55897 57793 


53514 21632 42301 23696 72641 
98540 23040 65782 23712 
86345 36795 38292 03852 
01363 06683 83891 88991 
61889 21693 12956 21804 


56310 93326 53264 41376 
10758 27270 45911 92453 
84421 36460 21262 59718 
09197 81502 38053 60780 
37682 68322 12475 40193 


64756 39278 51445 61182 
86602 48574 77854 77376 


14036 31033 63253 80257 
14255 33120 87651 82441 
04769 75920 98636 95895 


90334 00187 91659 79183 
08164 05584 36623 32547 
06501 08924 35514 28884 
48215 06821 03385 97978 
58499 17176 55993 09019 


96226 27167 68245 53109 
82590 52411 54783 29447 
62154 78291 33728 39102 
77108 56521 78610 08254 
47279 38471 20379 54704 


73087 17262 94735 04952 
38485 30594 56278 47395 
67874 78014 88381 04045 
07525 97908 61178 84635 
54782 58692 28332 41851 


15079 71230 34141 85002 
14613 98986 90945 45209 
42055 36113 53923 60824 
49817 42765 60660 13859 
75102 67649 73775 99247 
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